Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Evolutionary Applications 20181ndash18 emsp|emsp1wileyonlinelibrarycomjournaleva
Received31August2017emsp |emsp Accepted21December2017DOI101111eva12593
S P E C I A L I S S U E O R I G I N A L A R T I C L E
Diversity from genes to ecosystems A unifying framework to study variation across biological metrics and scales
Oscar E Gaggiotti1 emsp|emspAnne Chao2emsp|emspPedro Peres-Neto3emsp|emspChun-Huo Chiu4emsp|emsp Christine Edwards5 emsp|emspMarie-Joseacutee Fortin6emsp|emspLou Jost7emsp|emspChristopher M Richards8emsp|emsp Kimberly A Selkoe910
1SchoolofBiologyScottishOceansInstituteUniversityofStAndrewsStAndrewsUK2InstituteofStatisticsNationalTsingHuaUniversityHsin-ChuTaiwan3DepartmentofBiologyConcordiaUniversityMontrealQCCanada4DepartmentofAgronomyNationalTaiwanUniversityTaipeiTaiwan5CenterforConservationandSustainableDevelopmentMissouriBotanicalGardenSaintLouisMOUSA6DepartmentofEcologyandEvolutionaryBiologyUniversityofTorontoTorontoONCanada7EcomingaFundationBanosTungurahuaEcuador8PlantGermplasmPreservationResearchUnitUSDA-ARSFortCollinsCOUSA9NationalCenterforEcologicalAnalysisandSynthesisUniversityofCaliforniaSantaBarbaraSantaBarbaraCAUSA10HawairsquoiInstituteofMarineBiologyUniversityofHawairsquoiatMānoaKaneoheHIUSA
ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicensewhichpermitsusedistributionandreproductioninanymediumprovidedtheoriginalworkisproperlycitedcopy2018TheAuthorsEvolutionary ApplicationspublishedbyJohnWileyampSonsLtd
CorrespondenceOscarEGaggiottiSchoolofBiologyScottishOceansInstituteUniversityofStAndrewsStAndrewsUKEmailoegst-andrewsacuk
Funding informationUSNationalNaturalScienceFoundation(BioOCEAward)GrantAwardNumber1260169TheMarineAllianceforScienceandTechnologyforScotland(ScottishFundingCouncil)GrantAwardNumberHR09011NationalScienceFoundationGrantAwardNumberDBI-1300426TheUniversityofTennesseeNOAACoralReefConservationProgramtheMinistryofScienceandTechnologyTaiwanCanadaResearchChairinSpatialModellingandBiodiversity
AbstractBiologicaldiversityisakeyconceptinthelifesciencesandplaysafundamentalroleinmanyecologicalandevolutionaryprocessesAlthoughbiodiversityisinherentlyahier-archical conceptcoveringdifferent levelsoforganization (genespopulation speciesecological communities and ecosystems) a diversity index that behaves consistentlyacrossthesedifferentlevelshassofarbeenlackinghinderingthedevelopmentoftrulyintegrativebiodiversitystudiesTofillthisimportantknowledgegapwepresentaunify-ingframeworkforthemeasurementofbiodiversityacrosshierarchicallevelsoforgani-zationOurweightedinformation-baseddecompositionframeworkisbasedonaHillnumber of order q=1whichweightsallelementsinproportiontotheirfrequencyandleadstodiversitymeasuresbasedonShannonrsquosentropyWeinvestigatedthenumericalbehaviourofourapproachwithsimulationsandshowedthatitcanaccuratelydescribecomplex spatial hierarchical structuresTodemonstrate the intuitive and straightfor-wardinterpretationofourdiversitymeasuresintermsofeffectivenumberofcompo-nents(allelesspeciesetc)weappliedtheframeworktoarealdatasetoncoralreefbiodiversityWe expect our frameworkwill havemultiple applications covering thefieldsofconservationbiologycommunitygeneticsandeco-evolutionarydynamics
K E Y W O R D S
biodiversityindicesgeneticdiversityhierarchicalspatialstructureHillnumbersspeciesdiversity
2emsp |emsp emspensp GAGGIOTTI eT Al
1emsp |emspINTRODUCTION
Biologicaldiversityisafoundationalconceptinthelifesciencesandcritical to strategies forecological conservationHowever formanydecades biodiversity has been treated in a piecemealmannerwithecologists focusing on species diversity (but more recently also ontrait andphylogeneticdiversity) andpopulationgeneticists focusingongeneticdiversityThisdichotomyhasledtolargedifferencesinthetypeofdiversityindicesthathavebeenusedtomeasurespeciestraitphylogeneticandgeneticdiversityEcologistswereinitiallyfocusedonempiricaldevelopmentsandgeneratedaverylargenumberofspeciesdiversityindicesthatstronglydifferintheirnumericalbehaviour(Jost2006)andestimationproperties (BungeWillisampWalsh2014)Ontheotherhandpopulationgeneticswasinitiallydominatedbytheo-reticaldevelopmentsandmathematicalmodelsfocusedonaspecificsetofparametersthatdescribedgeneticdiversitywithinandamongpopulationswhichledtothedevelopmentofarestrictedsetofge-neticdiversity indicesThusalthoughbiodiversity is inherentlyahi-erarchical concept coveringdifferent levelsoforganization (geneticpopulationspeciesecologicalcommunitiesandecosystems)thelackof diversity indices that behave consistently across these different levelshasprecludedthedevelopmentoftrulyintegrativebiodiversitystudies
Recentlymotivatedby this lackofcommonmeasures forbiodi-versityatdifferentlevelsofbiologicalorganizationpopulationgenet-icistshavecarriedoutmethodologicaldevelopmentsthatextendtheuseofpopularspeciesdiversity indicestothemeasurementgeneticdiversityatdifferentlevelsofspatialsubdivision[egShannonrsquosandSimpsonrsquos indices (SherwinJabotRushampRossetto2006SmouseWhiteheadampPeakall2015)]However simplyadapting speciesdi-versitymeasuresisnotsufficientfortworeasonsFirstthereismuchcontroversyoverhowtoquantifyabundance-basedspeciesdiversityinacommunity(MendesEvangelistaThomazAgostinhoampGomes2008)Secondtherehasbeenlittleagreementonhowtopartitiondi-versityintoitsspatialcomponents(Ellison2010)ApromisingsolutionforaunifiedmeasureofgeneticdiversitycentresonHillnumbers(Hill1973)IndeedaconsensusisemergingontheuseofHillnumbersasaunifyingconcepttodefinemeasuresofvarioustypesofdiversityin-cludingspeciesphylogeneticandfunctionaldiversities(ChaoChiuampJost2014)ImportantlyHillnumbersfollowthereplicationprincipleensuringthatdiversitymeasuresarelinearinrelationtogrouppool-ingAssuch theycanbeused todevelopproperpartitionschemesacrossspatialscalesorotherhierarchicalstructuressuchaspopula-tionswithinmetapopulationsspecieswithinphylogeniescommuni-tieswithinecosystemsandtopoolinformationacrossdifferentlevelsin a hierarchy
Thepurposeof this studywas topresent a unifying frameworkfor the measurement of biodiversity across hierarchical levels of or-ganizationfromlocalpopulationtoecosystemlevelsWeexpectthatthisnewframeworkwillbeauseful tool forconservationbiologistsandwillalsofacilitatethedevelopmentofthefieldsofcommunityge-netics(Agrawal2003)andeco-evolutionarydynamics(Hendry2013)Thisnewframeworkmayalsofacilitatebridgingcommunityecology
processes(selectionamongspeciesdriftdispersalandspeciation)andthe processes emphasized by population genetics theory (selectionwithinspeciesdriftgeneflowandmutation)asexploredbyVellendetal(2014)Thepaperstartsbyoutlininghistoricaldevelopmentsonthe formulation and use of biodiversity measures in the fields of ecol-ogyandpopulationgenetics(Section2)WethenprovideanoverviewoftheuseofHillnumbersinecologyandtheirrelationshipwithpopu-lationgeneticparameterssuchasNe(Section3)Section4presentsaweightedinformation-baseddecompositionframeworkthatprovidesmeasuresofbothgeneticandspeciesdiversityatallhierarchicallevelsofspatialsubdivisionfrompopulationstoecosystemsThisisfollowedbythedescriptionofsoftwarethatimplementstheapproach(Section5)Section6explorespatternsofspeciesandgeneticdiversityunderdifferentspatialsubdivisionmodelsusingsimulateddatawithknowndiversityhierarchicalstructuresSection7showsanapplicationtoarealdatasetoncoralreefbiodiversity(Selkoeetal2016)Weclosewithadiscussionoftheadvantagesand limitationsofourapproachanditsapplicationsinthefieldsofconservationbiologycommunitygeneticsandeco-evolutionarydynamics
2emsp |emspHISTORICAL DEVELOPMENTS
Arguably the ultimate reason for methodological divergence in diver-sityindicesusedbypopulationgeneticistsandcommunityecologistsresidesintheverydifferentcontextsthat leadtotheemergenceofthesetwodisciplinesEcologistswereinterestedinunderstandingtheprocessesthatdeterminethestructureandcompositionofcommuni-tiesandcoulddirectlymeasurethecommunitytraits(numberofspe-ciesandtheirabundances)neededtocomparedifferentcommunitiesThisrelativelyeasyaccesstorealdataandaninitiallylimitedinterestinmechanisticmodelsfosteredthedevelopmentofalargenumberofdiversitymeasures toexplorespeciesdistributionaldata (Magurran2004) and eventually made the quantification of abundance-basedspecies diversity one of the most controversial issues in ecologyPopulationgeneticsontheotherhandaroseinresponsetoaneedtoreconciletwoopposingviewsofevolutionthathingedonthetypeofdiversityuponwhichnaturalselectionactedDarwinproposedthatitwassmallcontinuousvariationwhileGaltonbelievedthatnaturalselection acted upon large discontinuous variation (Provine 1971)Variation in thiscasewasanabstractconceptandcouldnotbedi-rectlymeasuredwhichmotivatedthedevelopmentofavastbodyoftheory centred around mathematical models describing the behaviour ofarestrictedsetofdiversitymeasures(Provine1971)
Althoughecologistsandpopulationgeneticistsuseverydifferentapproachestomeasurediversitytheyarebothinterestedindescrib-ing spatial patterns by decomposing total diversity intowithin- andamong-communitypopulation components But here again meth-odological developmentsdiffer greatlybetween the twodisciplinesEcologists engaged in intensedebateson the choiceofpartitioningschemes (Jost 2007)while population geneticists remained largelyfaithful to the use of so-called fixation indices proposed byWright(1951) Nevertheless the recently established fields of molecular
emspensp emsp | emsp3GAGGIOTTI eT Al
ecologycommunitygeneticsandeco-evolutionarydynamicsarehelp-ing to foster a convergence between the methods used to measure speciesandgeneticdiversity Indeed in the lastdecadepopulationgeneticistshavebeguntoextendtheuseofpopularspeciesdiversitymetrics to the measurement of genetic diversity by deriving mathe-matical expressions linking themwithevolutionaryparameters suchaseffectivepopulationsizeandmutationandmigrationrates (Chaoetal2015Sherwin2010Sherwinetal2006Smouseetal2015)
Regardless of this very recent methodological convergence ecolo-gistsandpopulationgeneticistsfacethesamechallengeswhentryingtocharacterizehowdiversitycomponents(alphabeta)arestructuredgeographicallyTheseproblemshavebeendescribedingreatdetailinthe literature (eg seeJost 2007 2010) so herewewill only giveaverybrief summaryThe first problem is that the commonlyusedwithin-community andwithin-population abundance diversity mea-sures (eg Shannon-Wiener index and heterozygosity) are in factentropiesmeaningthattheyquantifytheuncertainty inthespeciesor allele identity of randomly sampled individuals or alleles respec-tivelyImportantlytheseindicesdonotscalelinearlywithanincreaseindiversityandsomeofthem(egheterozygosity)reachanasymp-toteforlargevaluesThesecondproblemisthattheldquowithin-rdquo(alpha)andldquobetween-rdquo (beta)componentsofdiversityarenot independentIntuitively ifbetadependsonalpha itwouldbeimpossibletocom-parebetadiversitiesacrossalllevelsatwhichalphadiversitiesdiffer
Partitioning components of diversity is central to progress onthese problems Ecologists have related the traditional alpha betaandgammadiversityusingbothadditiveandmultiplicativeschemesofpartitioningOntheotherhandpopulationgeneticistshavealwaysusedthemultiplicativeschemebasedonthepartitioningoftheprob-abilityofidentitybydescentofpairsofalleles(inbreedingcoefficientsF)Althoughtherehasbeensomeconfusion(cfJost2008Jostetal2010MeirmansampHedrick2011) it iseasytodemonstratethatallestimators of FST a parameter that quantifies genetic structure in-cluding GST (Nei1973) andθ (WeirampCockerham1984) arebasedon thewell-knownmultiplicative decomposition ofWrightrsquos (1951)F-statistics (1minusFIT)= (1minusFIS)(1minusFST) where all terms are entropymeasuresdescribingtheuncertaintyintheidentitybydescentofpairsofalleleswhentheyaresampledfromthewholesetofpopulations(metapopulation)(1minusFIT)fromwithinthesamepopulation(1minusFIS) or fromtwodifferentpopulations(1minusFST)
As mentioned earlier ecologists engaged in intense debates onhow topartition speciesdiversitybut ina recentEcology forum(Ellison 2010) contributors agreed that a first step towards reach-ing a consensus was to adopt Hill numbers to measure diversityDiscussionsamongpopulationgeneticistsarelessadvancedbecauseof their traditional focus on the use of genetic polymorphism datato estimate important evolutionary parameterswhich requires thatgenetic diversity statistics be effective measures of the causes and consequencesofgeneticdifferentiation(egWhitlock2011)MuchtheoreticalworkisstillneededtodemonstratethatdiversitymeasuresbasedoninformationtheorydosatisfythisrequirementHereinsteadwearguethattheadoptionofHillnumbersinpopulationgeneticsisalsoagoodstartingpointtoreachaconsensusonhowtopartition
geneticdiversityInwhatfollowswefirstintroduceHillnumbersandthenpresentaweightedinformation-baseddecompositionframeworkapplicabletobothcommunityandpopulationgeneticsstudies
3emsp |emspOVERVIEW OF HILL NUMBERS
TherearenowmanyarticlesdescribingtheapplicationofHillnum-bers Here we follow Jost (2006) who reintroduced their use inecologyAsJost(2006)notedmostdiversityindicesareinfacten-tropiesthatmeasuretheuncertainty inthe identityofspecies (oralleles) inasampleHowever truediversitymeasuresshouldpro-videestimatesofthenumberofdistinctelements(speciesoralleles)in an aggregate (communityorpopulation) Toderive suchmeas-ureswefirstnotethatdiversityindicescreateequivalenceclassesamong aggregates in the sense that all aggregates with the same diversityindexvaluecanbeconsideredasequivalentForexampleallpopulationswith thesameheterozygosityvalueareequivalentin termsof this indexeven if theyhave radicallydifferentallelesfrequencies (seeAppendixS1 for an example)Moreover for anygivenheterozygosity therewill be an ldquoidealrdquo population inwhichallallelesareequallyfrequentItisthereforepossibletodefineanldquoeffectivenumberofelementsrdquo(allelesinthisexample)asthenum-ber of equally frequent elements in an ldquoideal aggregaterdquo that hasthesamediversityindexvalueastheldquorealaggregaterdquoAnexampleofeffectivenumber inanecologicalcontext istheeffectivenum-berofspeciesintroducedbyMacarthur(1965)whileanequivalentconcept in population genetics is the effective number of alleles(KimuraampCrow1964)
NotethattheconceptofeffectivepopulationsizeNeusedinpop-ulationgeneticsisanalogoustothatofHillnumbersbutisbasedonaratherdifferentconceptMorepreciselyNe is defined as the number ofindividualsinanideal(WrightndashFisher)populationthathasthesamemagnitudeofrandomgeneticdriftastherealpopulationbeingstud-iedTherearedifferentwaysinwhichwecanmeasurethestrengthofgenetic drift the most common being change in average inbreeding coefficientchangeinallelefrequencyvarianceandrateoflossofhet-erozygosityandeachleadtoadifferenttypeofeffectivesizeThustheidealandtherealpopulationsareequivalentintermsoftherateoflossofgeneticdiversityandnotintermsofequalrepresentationofdistinct individuals Probably the only similarity between Ne and the rationaleunderlyingHillnumbersisinthesensethatalltheindividualsintheidealpopulationcontributeequally(onaverage)tothegenepoolofthenextgeneration
Theapplicationoftheabove-statedlogictoanyofthemanydiffer-ententropymeasuresusedinecologyandpopulationgeneticsyieldsasingleexpressionfordiversity
where Sdenotesthenumberofspeciesorallelespi denotes the rel-ativeabundanceorfrequencyofspeciesoralleleiandtheexponent
(1)qDequiv
(
sum
S
i=1pqi
)1∕(1minusq)
4emsp |emsp emspensp GAGGIOTTI eT Al
andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)
where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles
It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition
4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)
Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused
41emsp|emspFormulation in terms of abundance diversity
Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany
number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance
Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities
FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)
α D(2)α andDγThesedi-
versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)
Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk
bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=
sum2
n=0nNinjk
and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=
sumS
i=1Nijk the total number of alleles in region k as N++k=
sumJk
j=1N+jk
and the total number of alleles in the ecosystem as N+++ =sumK
k=1N++k
All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the
(2)1D=exp
(
minussumS
i=1pi ln pi
)
=exp (H)
(3)2D=1∕
(
sumS
i=1p2i
)
emspensp emsp | emsp5GAGGIOTTI eT Al
weight for region k thus becomes w+k=sumJk
j=1wjk=N++k∕N+++Table2
describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions
Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)
α D(2)α andDγ all other diversity measures can be derived
fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)
αk for each
individual region k and then obtain the weighted average over all regions H(2)
α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)
α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)
α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions
The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma
alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)
42emsp|emspFormulation in terms of phylogenetic diversity
Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system
421emsp|emspPhylogenetic diversity measures in a single aggregate
Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime
F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system
6emsp |emsp emspensp GAGGIOTTI eT Al
referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)
LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=
sumB
i=1LiaiForanultrametrictree themeanbranch length
issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)
Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas
This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=
sumB
i=1Liwhich isthewell-knownFaithrsquosPDthesumof
the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between
anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)
HereIdenotesthephylogeneticentropy
whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)
422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system
The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate
Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare
(4)qPD=
sumB
i=1Li
(
ai
T
)q1∕(1minusq)
(5)1PD= lim qrarr1
qPD=exp
[
minussumB
i=1Liai
Tln
(
ai
T
)]
equivT exp (I∕T)
(6)I=minussumB
i=1Liai ln ai
TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus
Hierarchical level
Diversity
DecompositionWithin Between Total
3Ecosystem minus minus Dγ Dγ =D(1)α D
(1)
βD(2)
β
2 Region D(2)α D
(2)
β=D
(2)γ ∕D
(2)α D
(2)γ =Dγ D
γ=D
(2)α D
(2)
β
1Communityorpopulation D(1)α D
(1)β
=D(1)γ ∕D
(1)α D
(1)γ =D
(2)α D
(2)α = D
(1)α D
(1)β
TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure
Hierarchical level Speciesallele relative frequency
Population pijk=Nijk∕N+jk=Nijk∕sumS
i=1Nijk
Region pi+k= Ni+k∕N++k=sumJk
j=1(wjk∕w+k)pijk
Ecosystem pi++ = Ni++∕N+++ =sumK
k=1
sumJk
j=1wjkpijk
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
2emsp |emsp emspensp GAGGIOTTI eT Al
1emsp |emspINTRODUCTION
Biologicaldiversityisafoundationalconceptinthelifesciencesandcritical to strategies forecological conservationHowever formanydecades biodiversity has been treated in a piecemealmannerwithecologists focusing on species diversity (but more recently also ontrait andphylogeneticdiversity) andpopulationgeneticists focusingongeneticdiversityThisdichotomyhasledtolargedifferencesinthetypeofdiversityindicesthathavebeenusedtomeasurespeciestraitphylogeneticandgeneticdiversityEcologistswereinitiallyfocusedonempiricaldevelopmentsandgeneratedaverylargenumberofspeciesdiversityindicesthatstronglydifferintheirnumericalbehaviour(Jost2006)andestimationproperties (BungeWillisampWalsh2014)Ontheotherhandpopulationgeneticswasinitiallydominatedbytheo-reticaldevelopmentsandmathematicalmodelsfocusedonaspecificsetofparametersthatdescribedgeneticdiversitywithinandamongpopulationswhichledtothedevelopmentofarestrictedsetofge-neticdiversity indicesThusalthoughbiodiversity is inherentlyahi-erarchical concept coveringdifferent levelsoforganization (geneticpopulationspeciesecologicalcommunitiesandecosystems)thelackof diversity indices that behave consistently across these different levelshasprecludedthedevelopmentoftrulyintegrativebiodiversitystudies
Recentlymotivatedby this lackofcommonmeasures forbiodi-versityatdifferentlevelsofbiologicalorganizationpopulationgenet-icistshavecarriedoutmethodologicaldevelopmentsthatextendtheuseofpopularspeciesdiversity indicestothemeasurementgeneticdiversityatdifferentlevelsofspatialsubdivision[egShannonrsquosandSimpsonrsquos indices (SherwinJabotRushampRossetto2006SmouseWhiteheadampPeakall2015)]However simplyadapting speciesdi-versitymeasuresisnotsufficientfortworeasonsFirstthereismuchcontroversyoverhowtoquantifyabundance-basedspeciesdiversityinacommunity(MendesEvangelistaThomazAgostinhoampGomes2008)Secondtherehasbeenlittleagreementonhowtopartitiondi-versityintoitsspatialcomponents(Ellison2010)ApromisingsolutionforaunifiedmeasureofgeneticdiversitycentresonHillnumbers(Hill1973)IndeedaconsensusisemergingontheuseofHillnumbersasaunifyingconcepttodefinemeasuresofvarioustypesofdiversityin-cludingspeciesphylogeneticandfunctionaldiversities(ChaoChiuampJost2014)ImportantlyHillnumbersfollowthereplicationprincipleensuringthatdiversitymeasuresarelinearinrelationtogrouppool-ingAssuch theycanbeused todevelopproperpartitionschemesacrossspatialscalesorotherhierarchicalstructuressuchaspopula-tionswithinmetapopulationsspecieswithinphylogeniescommuni-tieswithinecosystemsandtopoolinformationacrossdifferentlevelsin a hierarchy
Thepurposeof this studywas topresent a unifying frameworkfor the measurement of biodiversity across hierarchical levels of or-ganizationfromlocalpopulationtoecosystemlevelsWeexpectthatthisnewframeworkwillbeauseful tool forconservationbiologistsandwillalsofacilitatethedevelopmentofthefieldsofcommunityge-netics(Agrawal2003)andeco-evolutionarydynamics(Hendry2013)Thisnewframeworkmayalsofacilitatebridgingcommunityecology
processes(selectionamongspeciesdriftdispersalandspeciation)andthe processes emphasized by population genetics theory (selectionwithinspeciesdriftgeneflowandmutation)asexploredbyVellendetal(2014)Thepaperstartsbyoutlininghistoricaldevelopmentsonthe formulation and use of biodiversity measures in the fields of ecol-ogyandpopulationgenetics(Section2)WethenprovideanoverviewoftheuseofHillnumbersinecologyandtheirrelationshipwithpopu-lationgeneticparameterssuchasNe(Section3)Section4presentsaweightedinformation-baseddecompositionframeworkthatprovidesmeasuresofbothgeneticandspeciesdiversityatallhierarchicallevelsofspatialsubdivisionfrompopulationstoecosystemsThisisfollowedbythedescriptionofsoftwarethatimplementstheapproach(Section5)Section6explorespatternsofspeciesandgeneticdiversityunderdifferentspatialsubdivisionmodelsusingsimulateddatawithknowndiversityhierarchicalstructuresSection7showsanapplicationtoarealdatasetoncoralreefbiodiversity(Selkoeetal2016)Weclosewithadiscussionoftheadvantagesand limitationsofourapproachanditsapplicationsinthefieldsofconservationbiologycommunitygeneticsandeco-evolutionarydynamics
2emsp |emspHISTORICAL DEVELOPMENTS
Arguably the ultimate reason for methodological divergence in diver-sityindicesusedbypopulationgeneticistsandcommunityecologistsresidesintheverydifferentcontextsthat leadtotheemergenceofthesetwodisciplinesEcologistswereinterestedinunderstandingtheprocessesthatdeterminethestructureandcompositionofcommuni-tiesandcoulddirectlymeasurethecommunitytraits(numberofspe-ciesandtheirabundances)neededtocomparedifferentcommunitiesThisrelativelyeasyaccesstorealdataandaninitiallylimitedinterestinmechanisticmodelsfosteredthedevelopmentofalargenumberofdiversitymeasures toexplorespeciesdistributionaldata (Magurran2004) and eventually made the quantification of abundance-basedspecies diversity one of the most controversial issues in ecologyPopulationgeneticsontheotherhandaroseinresponsetoaneedtoreconciletwoopposingviewsofevolutionthathingedonthetypeofdiversityuponwhichnaturalselectionactedDarwinproposedthatitwassmallcontinuousvariationwhileGaltonbelievedthatnaturalselection acted upon large discontinuous variation (Provine 1971)Variation in thiscasewasanabstractconceptandcouldnotbedi-rectlymeasuredwhichmotivatedthedevelopmentofavastbodyoftheory centred around mathematical models describing the behaviour ofarestrictedsetofdiversitymeasures(Provine1971)
Althoughecologistsandpopulationgeneticistsuseverydifferentapproachestomeasurediversitytheyarebothinterestedindescrib-ing spatial patterns by decomposing total diversity intowithin- andamong-communitypopulation components But here again meth-odological developmentsdiffer greatlybetween the twodisciplinesEcologists engaged in intensedebateson the choiceofpartitioningschemes (Jost 2007)while population geneticists remained largelyfaithful to the use of so-called fixation indices proposed byWright(1951) Nevertheless the recently established fields of molecular
emspensp emsp | emsp3GAGGIOTTI eT Al
ecologycommunitygeneticsandeco-evolutionarydynamicsarehelp-ing to foster a convergence between the methods used to measure speciesandgeneticdiversity Indeed in the lastdecadepopulationgeneticistshavebeguntoextendtheuseofpopularspeciesdiversitymetrics to the measurement of genetic diversity by deriving mathe-matical expressions linking themwithevolutionaryparameters suchaseffectivepopulationsizeandmutationandmigrationrates (Chaoetal2015Sherwin2010Sherwinetal2006Smouseetal2015)
Regardless of this very recent methodological convergence ecolo-gistsandpopulationgeneticistsfacethesamechallengeswhentryingtocharacterizehowdiversitycomponents(alphabeta)arestructuredgeographicallyTheseproblemshavebeendescribedingreatdetailinthe literature (eg seeJost 2007 2010) so herewewill only giveaverybrief summaryThe first problem is that the commonlyusedwithin-community andwithin-population abundance diversity mea-sures (eg Shannon-Wiener index and heterozygosity) are in factentropiesmeaningthattheyquantifytheuncertainty inthespeciesor allele identity of randomly sampled individuals or alleles respec-tivelyImportantlytheseindicesdonotscalelinearlywithanincreaseindiversityandsomeofthem(egheterozygosity)reachanasymp-toteforlargevaluesThesecondproblemisthattheldquowithin-rdquo(alpha)andldquobetween-rdquo (beta)componentsofdiversityarenot independentIntuitively ifbetadependsonalpha itwouldbeimpossibletocom-parebetadiversitiesacrossalllevelsatwhichalphadiversitiesdiffer
Partitioning components of diversity is central to progress onthese problems Ecologists have related the traditional alpha betaandgammadiversityusingbothadditiveandmultiplicativeschemesofpartitioningOntheotherhandpopulationgeneticistshavealwaysusedthemultiplicativeschemebasedonthepartitioningoftheprob-abilityofidentitybydescentofpairsofalleles(inbreedingcoefficientsF)Althoughtherehasbeensomeconfusion(cfJost2008Jostetal2010MeirmansampHedrick2011) it iseasytodemonstratethatallestimators of FST a parameter that quantifies genetic structure in-cluding GST (Nei1973) andθ (WeirampCockerham1984) arebasedon thewell-knownmultiplicative decomposition ofWrightrsquos (1951)F-statistics (1minusFIT)= (1minusFIS)(1minusFST) where all terms are entropymeasuresdescribingtheuncertaintyintheidentitybydescentofpairsofalleleswhentheyaresampledfromthewholesetofpopulations(metapopulation)(1minusFIT)fromwithinthesamepopulation(1minusFIS) or fromtwodifferentpopulations(1minusFST)
As mentioned earlier ecologists engaged in intense debates onhow topartition speciesdiversitybut ina recentEcology forum(Ellison 2010) contributors agreed that a first step towards reach-ing a consensus was to adopt Hill numbers to measure diversityDiscussionsamongpopulationgeneticistsarelessadvancedbecauseof their traditional focus on the use of genetic polymorphism datato estimate important evolutionary parameterswhich requires thatgenetic diversity statistics be effective measures of the causes and consequencesofgeneticdifferentiation(egWhitlock2011)MuchtheoreticalworkisstillneededtodemonstratethatdiversitymeasuresbasedoninformationtheorydosatisfythisrequirementHereinsteadwearguethattheadoptionofHillnumbersinpopulationgeneticsisalsoagoodstartingpointtoreachaconsensusonhowtopartition
geneticdiversityInwhatfollowswefirstintroduceHillnumbersandthenpresentaweightedinformation-baseddecompositionframeworkapplicabletobothcommunityandpopulationgeneticsstudies
3emsp |emspOVERVIEW OF HILL NUMBERS
TherearenowmanyarticlesdescribingtheapplicationofHillnum-bers Here we follow Jost (2006) who reintroduced their use inecologyAsJost(2006)notedmostdiversityindicesareinfacten-tropiesthatmeasuretheuncertainty inthe identityofspecies (oralleles) inasampleHowever truediversitymeasuresshouldpro-videestimatesofthenumberofdistinctelements(speciesoralleles)in an aggregate (communityorpopulation) Toderive suchmeas-ureswefirstnotethatdiversityindicescreateequivalenceclassesamong aggregates in the sense that all aggregates with the same diversityindexvaluecanbeconsideredasequivalentForexampleallpopulationswith thesameheterozygosityvalueareequivalentin termsof this indexeven if theyhave radicallydifferentallelesfrequencies (seeAppendixS1 for an example)Moreover for anygivenheterozygosity therewill be an ldquoidealrdquo population inwhichallallelesareequallyfrequentItisthereforepossibletodefineanldquoeffectivenumberofelementsrdquo(allelesinthisexample)asthenum-ber of equally frequent elements in an ldquoideal aggregaterdquo that hasthesamediversityindexvalueastheldquorealaggregaterdquoAnexampleofeffectivenumber inanecologicalcontext istheeffectivenum-berofspeciesintroducedbyMacarthur(1965)whileanequivalentconcept in population genetics is the effective number of alleles(KimuraampCrow1964)
NotethattheconceptofeffectivepopulationsizeNeusedinpop-ulationgeneticsisanalogoustothatofHillnumbersbutisbasedonaratherdifferentconceptMorepreciselyNe is defined as the number ofindividualsinanideal(WrightndashFisher)populationthathasthesamemagnitudeofrandomgeneticdriftastherealpopulationbeingstud-iedTherearedifferentwaysinwhichwecanmeasurethestrengthofgenetic drift the most common being change in average inbreeding coefficientchangeinallelefrequencyvarianceandrateoflossofhet-erozygosityandeachleadtoadifferenttypeofeffectivesizeThustheidealandtherealpopulationsareequivalentintermsoftherateoflossofgeneticdiversityandnotintermsofequalrepresentationofdistinct individuals Probably the only similarity between Ne and the rationaleunderlyingHillnumbersisinthesensethatalltheindividualsintheidealpopulationcontributeequally(onaverage)tothegenepoolofthenextgeneration
Theapplicationoftheabove-statedlogictoanyofthemanydiffer-ententropymeasuresusedinecologyandpopulationgeneticsyieldsasingleexpressionfordiversity
where Sdenotesthenumberofspeciesorallelespi denotes the rel-ativeabundanceorfrequencyofspeciesoralleleiandtheexponent
(1)qDequiv
(
sum
S
i=1pqi
)1∕(1minusq)
4emsp |emsp emspensp GAGGIOTTI eT Al
andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)
where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles
It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition
4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)
Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused
41emsp|emspFormulation in terms of abundance diversity
Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany
number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance
Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities
FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)
α D(2)α andDγThesedi-
versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)
Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk
bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=
sum2
n=0nNinjk
and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=
sumS
i=1Nijk the total number of alleles in region k as N++k=
sumJk
j=1N+jk
and the total number of alleles in the ecosystem as N+++ =sumK
k=1N++k
All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the
(2)1D=exp
(
minussumS
i=1pi ln pi
)
=exp (H)
(3)2D=1∕
(
sumS
i=1p2i
)
emspensp emsp | emsp5GAGGIOTTI eT Al
weight for region k thus becomes w+k=sumJk
j=1wjk=N++k∕N+++Table2
describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions
Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)
α D(2)α andDγ all other diversity measures can be derived
fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)
αk for each
individual region k and then obtain the weighted average over all regions H(2)
α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)
α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)
α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions
The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma
alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)
42emsp|emspFormulation in terms of phylogenetic diversity
Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system
421emsp|emspPhylogenetic diversity measures in a single aggregate
Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime
F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system
6emsp |emsp emspensp GAGGIOTTI eT Al
referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)
LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=
sumB
i=1LiaiForanultrametrictree themeanbranch length
issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)
Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas
This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=
sumB
i=1Liwhich isthewell-knownFaithrsquosPDthesumof
the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between
anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)
HereIdenotesthephylogeneticentropy
whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)
422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system
The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate
Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare
(4)qPD=
sumB
i=1Li
(
ai
T
)q1∕(1minusq)
(5)1PD= lim qrarr1
qPD=exp
[
minussumB
i=1Liai
Tln
(
ai
T
)]
equivT exp (I∕T)
(6)I=minussumB
i=1Liai ln ai
TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus
Hierarchical level
Diversity
DecompositionWithin Between Total
3Ecosystem minus minus Dγ Dγ =D(1)α D
(1)
βD(2)
β
2 Region D(2)α D
(2)
β=D
(2)γ ∕D
(2)α D
(2)γ =Dγ D
γ=D
(2)α D
(2)
β
1Communityorpopulation D(1)α D
(1)β
=D(1)γ ∕D
(1)α D
(1)γ =D
(2)α D
(2)α = D
(1)α D
(1)β
TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure
Hierarchical level Speciesallele relative frequency
Population pijk=Nijk∕N+jk=Nijk∕sumS
i=1Nijk
Region pi+k= Ni+k∕N++k=sumJk
j=1(wjk∕w+k)pijk
Ecosystem pi++ = Ni++∕N+++ =sumK
k=1
sumJk
j=1wjkpijk
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp3GAGGIOTTI eT Al
ecologycommunitygeneticsandeco-evolutionarydynamicsarehelp-ing to foster a convergence between the methods used to measure speciesandgeneticdiversity Indeed in the lastdecadepopulationgeneticistshavebeguntoextendtheuseofpopularspeciesdiversitymetrics to the measurement of genetic diversity by deriving mathe-matical expressions linking themwithevolutionaryparameters suchaseffectivepopulationsizeandmutationandmigrationrates (Chaoetal2015Sherwin2010Sherwinetal2006Smouseetal2015)
Regardless of this very recent methodological convergence ecolo-gistsandpopulationgeneticistsfacethesamechallengeswhentryingtocharacterizehowdiversitycomponents(alphabeta)arestructuredgeographicallyTheseproblemshavebeendescribedingreatdetailinthe literature (eg seeJost 2007 2010) so herewewill only giveaverybrief summaryThe first problem is that the commonlyusedwithin-community andwithin-population abundance diversity mea-sures (eg Shannon-Wiener index and heterozygosity) are in factentropiesmeaningthattheyquantifytheuncertainty inthespeciesor allele identity of randomly sampled individuals or alleles respec-tivelyImportantlytheseindicesdonotscalelinearlywithanincreaseindiversityandsomeofthem(egheterozygosity)reachanasymp-toteforlargevaluesThesecondproblemisthattheldquowithin-rdquo(alpha)andldquobetween-rdquo (beta)componentsofdiversityarenot independentIntuitively ifbetadependsonalpha itwouldbeimpossibletocom-parebetadiversitiesacrossalllevelsatwhichalphadiversitiesdiffer
Partitioning components of diversity is central to progress onthese problems Ecologists have related the traditional alpha betaandgammadiversityusingbothadditiveandmultiplicativeschemesofpartitioningOntheotherhandpopulationgeneticistshavealwaysusedthemultiplicativeschemebasedonthepartitioningoftheprob-abilityofidentitybydescentofpairsofalleles(inbreedingcoefficientsF)Althoughtherehasbeensomeconfusion(cfJost2008Jostetal2010MeirmansampHedrick2011) it iseasytodemonstratethatallestimators of FST a parameter that quantifies genetic structure in-cluding GST (Nei1973) andθ (WeirampCockerham1984) arebasedon thewell-knownmultiplicative decomposition ofWrightrsquos (1951)F-statistics (1minusFIT)= (1minusFIS)(1minusFST) where all terms are entropymeasuresdescribingtheuncertaintyintheidentitybydescentofpairsofalleleswhentheyaresampledfromthewholesetofpopulations(metapopulation)(1minusFIT)fromwithinthesamepopulation(1minusFIS) or fromtwodifferentpopulations(1minusFST)
As mentioned earlier ecologists engaged in intense debates onhow topartition speciesdiversitybut ina recentEcology forum(Ellison 2010) contributors agreed that a first step towards reach-ing a consensus was to adopt Hill numbers to measure diversityDiscussionsamongpopulationgeneticistsarelessadvancedbecauseof their traditional focus on the use of genetic polymorphism datato estimate important evolutionary parameterswhich requires thatgenetic diversity statistics be effective measures of the causes and consequencesofgeneticdifferentiation(egWhitlock2011)MuchtheoreticalworkisstillneededtodemonstratethatdiversitymeasuresbasedoninformationtheorydosatisfythisrequirementHereinsteadwearguethattheadoptionofHillnumbersinpopulationgeneticsisalsoagoodstartingpointtoreachaconsensusonhowtopartition
geneticdiversityInwhatfollowswefirstintroduceHillnumbersandthenpresentaweightedinformation-baseddecompositionframeworkapplicabletobothcommunityandpopulationgeneticsstudies
3emsp |emspOVERVIEW OF HILL NUMBERS
TherearenowmanyarticlesdescribingtheapplicationofHillnum-bers Here we follow Jost (2006) who reintroduced their use inecologyAsJost(2006)notedmostdiversityindicesareinfacten-tropiesthatmeasuretheuncertainty inthe identityofspecies (oralleles) inasampleHowever truediversitymeasuresshouldpro-videestimatesofthenumberofdistinctelements(speciesoralleles)in an aggregate (communityorpopulation) Toderive suchmeas-ureswefirstnotethatdiversityindicescreateequivalenceclassesamong aggregates in the sense that all aggregates with the same diversityindexvaluecanbeconsideredasequivalentForexampleallpopulationswith thesameheterozygosityvalueareequivalentin termsof this indexeven if theyhave radicallydifferentallelesfrequencies (seeAppendixS1 for an example)Moreover for anygivenheterozygosity therewill be an ldquoidealrdquo population inwhichallallelesareequallyfrequentItisthereforepossibletodefineanldquoeffectivenumberofelementsrdquo(allelesinthisexample)asthenum-ber of equally frequent elements in an ldquoideal aggregaterdquo that hasthesamediversityindexvalueastheldquorealaggregaterdquoAnexampleofeffectivenumber inanecologicalcontext istheeffectivenum-berofspeciesintroducedbyMacarthur(1965)whileanequivalentconcept in population genetics is the effective number of alleles(KimuraampCrow1964)
NotethattheconceptofeffectivepopulationsizeNeusedinpop-ulationgeneticsisanalogoustothatofHillnumbersbutisbasedonaratherdifferentconceptMorepreciselyNe is defined as the number ofindividualsinanideal(WrightndashFisher)populationthathasthesamemagnitudeofrandomgeneticdriftastherealpopulationbeingstud-iedTherearedifferentwaysinwhichwecanmeasurethestrengthofgenetic drift the most common being change in average inbreeding coefficientchangeinallelefrequencyvarianceandrateoflossofhet-erozygosityandeachleadtoadifferenttypeofeffectivesizeThustheidealandtherealpopulationsareequivalentintermsoftherateoflossofgeneticdiversityandnotintermsofequalrepresentationofdistinct individuals Probably the only similarity between Ne and the rationaleunderlyingHillnumbersisinthesensethatalltheindividualsintheidealpopulationcontributeequally(onaverage)tothegenepoolofthenextgeneration
Theapplicationoftheabove-statedlogictoanyofthemanydiffer-ententropymeasuresusedinecologyandpopulationgeneticsyieldsasingleexpressionfordiversity
where Sdenotesthenumberofspeciesorallelespi denotes the rel-ativeabundanceorfrequencyofspeciesoralleleiandtheexponent
(1)qDequiv
(
sum
S
i=1pqi
)1∕(1minusq)
4emsp |emsp emspensp GAGGIOTTI eT Al
andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)
where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles
It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition
4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)
Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused
41emsp|emspFormulation in terms of abundance diversity
Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany
number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance
Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities
FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)
α D(2)α andDγThesedi-
versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)
Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk
bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=
sum2
n=0nNinjk
and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=
sumS
i=1Nijk the total number of alleles in region k as N++k=
sumJk
j=1N+jk
and the total number of alleles in the ecosystem as N+++ =sumK
k=1N++k
All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the
(2)1D=exp
(
minussumS
i=1pi ln pi
)
=exp (H)
(3)2D=1∕
(
sumS
i=1p2i
)
emspensp emsp | emsp5GAGGIOTTI eT Al
weight for region k thus becomes w+k=sumJk
j=1wjk=N++k∕N+++Table2
describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions
Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)
α D(2)α andDγ all other diversity measures can be derived
fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)
αk for each
individual region k and then obtain the weighted average over all regions H(2)
α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)
α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)
α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions
The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma
alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)
42emsp|emspFormulation in terms of phylogenetic diversity
Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system
421emsp|emspPhylogenetic diversity measures in a single aggregate
Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime
F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system
6emsp |emsp emspensp GAGGIOTTI eT Al
referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)
LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=
sumB
i=1LiaiForanultrametrictree themeanbranch length
issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)
Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas
This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=
sumB
i=1Liwhich isthewell-knownFaithrsquosPDthesumof
the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between
anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)
HereIdenotesthephylogeneticentropy
whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)
422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system
The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate
Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare
(4)qPD=
sumB
i=1Li
(
ai
T
)q1∕(1minusq)
(5)1PD= lim qrarr1
qPD=exp
[
minussumB
i=1Liai
Tln
(
ai
T
)]
equivT exp (I∕T)
(6)I=minussumB
i=1Liai ln ai
TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus
Hierarchical level
Diversity
DecompositionWithin Between Total
3Ecosystem minus minus Dγ Dγ =D(1)α D
(1)
βD(2)
β
2 Region D(2)α D
(2)
β=D
(2)γ ∕D
(2)α D
(2)γ =Dγ D
γ=D
(2)α D
(2)
β
1Communityorpopulation D(1)α D
(1)β
=D(1)γ ∕D
(1)α D
(1)γ =D
(2)α D
(2)α = D
(1)α D
(1)β
TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure
Hierarchical level Speciesallele relative frequency
Population pijk=Nijk∕N+jk=Nijk∕sumS
i=1Nijk
Region pi+k= Ni+k∕N++k=sumJk
j=1(wjk∕w+k)pijk
Ecosystem pi++ = Ni++∕N+++ =sumK
k=1
sumJk
j=1wjkpijk
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
4emsp |emsp emspensp GAGGIOTTI eT Al
andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)
where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles
It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition
4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)
Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused
41emsp|emspFormulation in terms of abundance diversity
Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany
number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance
Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities
FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)
α D(2)α andDγThesedi-
versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)
Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk
bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=
sum2
n=0nNinjk
and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=
sumS
i=1Nijk the total number of alleles in region k as N++k=
sumJk
j=1N+jk
and the total number of alleles in the ecosystem as N+++ =sumK
k=1N++k
All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the
(2)1D=exp
(
minussumS
i=1pi ln pi
)
=exp (H)
(3)2D=1∕
(
sumS
i=1p2i
)
emspensp emsp | emsp5GAGGIOTTI eT Al
weight for region k thus becomes w+k=sumJk
j=1wjk=N++k∕N+++Table2
describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions
Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)
α D(2)α andDγ all other diversity measures can be derived
fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)
αk for each
individual region k and then obtain the weighted average over all regions H(2)
α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)
α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)
α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions
The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma
alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)
42emsp|emspFormulation in terms of phylogenetic diversity
Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system
421emsp|emspPhylogenetic diversity measures in a single aggregate
Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime
F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system
6emsp |emsp emspensp GAGGIOTTI eT Al
referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)
LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=
sumB
i=1LiaiForanultrametrictree themeanbranch length
issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)
Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas
This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=
sumB
i=1Liwhich isthewell-knownFaithrsquosPDthesumof
the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between
anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)
HereIdenotesthephylogeneticentropy
whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)
422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system
The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate
Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare
(4)qPD=
sumB
i=1Li
(
ai
T
)q1∕(1minusq)
(5)1PD= lim qrarr1
qPD=exp
[
minussumB
i=1Liai
Tln
(
ai
T
)]
equivT exp (I∕T)
(6)I=minussumB
i=1Liai ln ai
TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus
Hierarchical level
Diversity
DecompositionWithin Between Total
3Ecosystem minus minus Dγ Dγ =D(1)α D
(1)
βD(2)
β
2 Region D(2)α D
(2)
β=D
(2)γ ∕D
(2)α D
(2)γ =Dγ D
γ=D
(2)α D
(2)
β
1Communityorpopulation D(1)α D
(1)β
=D(1)γ ∕D
(1)α D
(1)γ =D
(2)α D
(2)α = D
(1)α D
(1)β
TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure
Hierarchical level Speciesallele relative frequency
Population pijk=Nijk∕N+jk=Nijk∕sumS
i=1Nijk
Region pi+k= Ni+k∕N++k=sumJk
j=1(wjk∕w+k)pijk
Ecosystem pi++ = Ni++∕N+++ =sumK
k=1
sumJk
j=1wjkpijk
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp5GAGGIOTTI eT Al
weight for region k thus becomes w+k=sumJk
j=1wjk=N++k∕N+++Table2
describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions
Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)
α D(2)α andDγ all other diversity measures can be derived
fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)
αk for each
individual region k and then obtain the weighted average over all regions H(2)
α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)
α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)
α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions
The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma
alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)
42emsp|emspFormulation in terms of phylogenetic diversity
Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system
421emsp|emspPhylogenetic diversity measures in a single aggregate
Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime
F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system
6emsp |emsp emspensp GAGGIOTTI eT Al
referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)
LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=
sumB
i=1LiaiForanultrametrictree themeanbranch length
issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)
Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas
This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=
sumB
i=1Liwhich isthewell-knownFaithrsquosPDthesumof
the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between
anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)
HereIdenotesthephylogeneticentropy
whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)
422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system
The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate
Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare
(4)qPD=
sumB
i=1Li
(
ai
T
)q1∕(1minusq)
(5)1PD= lim qrarr1
qPD=exp
[
minussumB
i=1Liai
Tln
(
ai
T
)]
equivT exp (I∕T)
(6)I=minussumB
i=1Liai ln ai
TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus
Hierarchical level
Diversity
DecompositionWithin Between Total
3Ecosystem minus minus Dγ Dγ =D(1)α D
(1)
βD(2)
β
2 Region D(2)α D
(2)
β=D
(2)γ ∕D
(2)α D
(2)γ =Dγ D
γ=D
(2)α D
(2)
β
1Communityorpopulation D(1)α D
(1)β
=D(1)γ ∕D
(1)α D
(1)γ =D
(2)α D
(2)α = D
(1)α D
(1)β
TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure
Hierarchical level Speciesallele relative frequency
Population pijk=Nijk∕N+jk=Nijk∕sumS
i=1Nijk
Region pi+k= Ni+k∕N++k=sumJk
j=1(wjk∕w+k)pijk
Ecosystem pi++ = Ni++∕N+++ =sumK
k=1
sumJk
j=1wjkpijk
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
6emsp |emsp emspensp GAGGIOTTI eT Al
referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)
LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=
sumB
i=1LiaiForanultrametrictree themeanbranch length
issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)
Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas
This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=
sumB
i=1Liwhich isthewell-knownFaithrsquosPDthesumof
the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between
anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)
HereIdenotesthephylogeneticentropy
whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)
422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system
The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate
Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare
(4)qPD=
sumB
i=1Li
(
ai
T
)q1∕(1minusq)
(5)1PD= lim qrarr1
qPD=exp
[
minussumB
i=1Liai
Tln
(
ai
T
)]
equivT exp (I∕T)
(6)I=minussumB
i=1Liai ln ai
TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus
Hierarchical level
Diversity
DecompositionWithin Between Total
3Ecosystem minus minus Dγ Dγ =D(1)α D
(1)
βD(2)
β
2 Region D(2)α D
(2)
β=D
(2)γ ∕D
(2)α D
(2)γ =Dγ D
γ=D
(2)α D
(2)
β
1Communityorpopulation D(1)α D
(1)β
=D(1)γ ∕D
(1)α D
(1)γ =D
(2)α D
(2)α = D
(1)α D
(1)β
TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure
Hierarchical level Speciesallele relative frequency
Population pijk=Nijk∕N+jk=Nijk∕sumS
i=1Nijk
Region pi+k= Ni+k∕N++k=sumJk
j=1(wjk∕w+k)pijk
Ecosystem pi++ = Ni++∕N+++ =sumK
k=1
sumJk
j=1wjkpijk
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp7GAGGIOTTI eT Al
giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess
5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE
TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset
TheRfunctioniDIPrequirestwoinputmatrices
1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)
2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions
Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among
aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel
We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples
6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK
Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight
TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)
Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity
Level3Ecosystem gammaDγ =exp
minusSsum
i=1
pi++ lnpi++
equivexp
(
Hγ
)
PDγ =Ttimesexp
minusBsum
i=1
Liai++ lnai++
∕T
equivTtimesexp
(
Iγ∕T)
Level2Region gamma D(2)γ =Dγ PD
(2)
γ=PDγ
alpha D(2)α =exp
(
H(2)α
)
PD(2)
α=Ttimesexp
(
I(2)α ∕T
)
where H(2)α =
sum
k
w+kH(2)
αk
where I(2)α =
sum
k
w+kI(2)
αk
H(2)
αk=minus
Ssum
i=1
pi+k ln pi+k I(2)
αk=minus
Bsum
i=1
Liai+k ln ai+k
beta D(2)
β=D
(2)γ ∕D
(2)α PD
(2)
β=PD
(2)
γ∕PD
(2)
α
Level1Population or community
gamma D(1)γ =D
(2)α PD(
1)γ
=PD(2)
α
alpha D(1)α =exp
(
H(1)α
)
PD(1)α
=Ttimesexp(
I(1)α ∕T
)
where H(1)α =
sum
jk
wjkH(1)αjk
where I(1)α =
sum
jk
wjkI(1)αjk
H(1)αjk
=minusSsum
i=1
pijk ln pijk I(1)αjk
=minusBsum
i=1
Liaijk ln aijk
beta D(1)β
=D(1)γ ∕D
(1)α PD
(1)β
=PD(1)γ
∕PD(1)α
Differentiation among aggregates at each level
Level2Amongregions Δ(2)
D=
HγminusH(2)α
minussum
k w+k lnw+k
Δ(2)
PD=
IγminusI(2)α
minusTsum
k w+k lnw+k
Level1Populationcommunitywithinregion
Δ(1)D
=H(2)α minusH
(1)α
minussum
jk wjk ln(wjk∕w+k)Δ(1)PD
=I(2)α minusI
(1)α
minusTsum
jk wjk ln(wjk∕w+k)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
8emsp |emsp emspensp GAGGIOTTI eT Al
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp9GAGGIOTTI eT Al
populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner
Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform
where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci
Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat
number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)
For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)
Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ
TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases
pil=
⎧
⎪
⎨
⎪
⎩
0 if εillt0
εil if 0le εille1
1 if εilgt1
F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
10emsp |emsp emspensp GAGGIOTTI eT Al
The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross
δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix
F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp11GAGGIOTTI eT Al
in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation
Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound
hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem
7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM
Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)
TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring
F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
12emsp |emsp emspensp GAGGIOTTI eT Al
ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)
We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable
71emsp|emspSpecies diversity
Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)
α =3777 IslandD(1)α =2775)
GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)
β=129representsthenumberofregionequivalentsin
theHawaiianarchipelagowhileD(1)
β=1361 is the average number
ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto
benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region
Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551
FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity
72emsp|emspGenetic Diversity
Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed
TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem
Level Diversity
3HawaiianArchipelago Dγ = 48744
2 Region D(2)γ =Dγ D
(2)α =37773D
(2)
β=1290
1Island(community) D(1)γ =D
(2)α D
(1)α =27752D
(1)β
=1361
Differentiation among aggregates at each level
2 Region Δ(2)
D=0290
1Island(community) Δ(1)D
=0153
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp13GAGGIOTTI eT Al
among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies
Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations
AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)
8emsp |emspDISCUSSION
Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an
information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity
Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg
F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens
(a) species diversity (b) E coruscans
(c) Z flabescens
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
14emsp |emsp emspensp GAGGIOTTI eT Al
bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill
numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions
As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties
Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand
F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens
(a)
Spe
cies
div
ersi
ty(a
)S
peci
esdi
vers
ity
(b)
Gen
etic
div
ersi
tyE
coru
scan
sG
enet
icdi
vers
ityc
orus
cans
(c)
Gen
etic
div
ersi
tyZ
flab
esce
nsG
enet
icdi
vers
ityyyyyZZZ
flabe
scen
s
TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci
Level Diversity
3HawaiianArchipelago Dγ=8249
2 Region D(2)γ =Dγ D
(2)α =8083D
(2)
β=1016
1Island(population) D(1)γ =D
(2)α D
(1)α =7077D
(1)β
=1117
Differentiation among aggregates at each level
2 Region Δ(2)
D=0023
1Island(community) Δ(1)D
=0062
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp15GAGGIOTTI eT Al
ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1
Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot
In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant
speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata
Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section
TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets
Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural
TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci
Level Diversity
3HawaiianArchipelago Dγ = 8404
2 Region D(2)γ =Dγ D
(2)α =8290D
(2)
β=1012
1Island(community) D(1)γ =D
(2)α D
(1)α =7690D
(1)β
=1065
Differentiation among aggregates at each level
2 Region Δ(2)
D=0014
1Island(community) Δ(1)D
=0033
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
16emsp |emsp emspensp GAGGIOTTI eT Al
populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection
Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses
InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the
informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation
Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase
Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity
Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
emspensp emsp | emsp17GAGGIOTTI eT Al
ACKNOWLEDGEMENTS
This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis
DATA ARCHIVING STATEMENT
AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)
ORCID
Oscar E Gaggiotti httporcidorg0000-0003-1827-1493
Christine Edwards httporcidorg0000-0001-8837-4872
REFERENCES
AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2
Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101
AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell
AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665
BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21
BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035
BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654
ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551
Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247
ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272
ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540
ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349
ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471
ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014
Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819
EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083
Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921
FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2
GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036
Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375
HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352
JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x
Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361
Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x
JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681
JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x
Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
18emsp |emsp emspensp GAGGIOTTI eT Al
KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738
KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155
KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3
MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x
Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience
Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299
MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x
Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x
MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x
Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323
PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x
ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress
RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268
RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082
Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199
Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696
SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)
SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z
SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765
SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x
SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562
SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422
VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756
deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705
WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370
WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x
Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516
WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302
WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354
SUPPORTING INFORMATION
Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle
How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
1
APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two
Allele Population 1 2
A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05
APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition
119915119954 abundance diversity of order q also referred to as Hill number of order q
119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy
119915120631(119949) beta abundance diversity at level l of the hierarchy
119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in
populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of
region k
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
2
119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k
119919120630119948(120784) alpha entropy for region k
119919120630(119949) total alpha entropy at level l of the hierarchy
120491119915(119949) abundance diversity differentiation among aggregates (l =
populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from
the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy
119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k
119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k
119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem
119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy
119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy
119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy
119927119915120632 gamma phylogenetic diversity at the ecosystem level
119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k
119920120630119948(120784) alpha phylogenetic entropy for region k
119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy
120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =
populationscommunities regions)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
3
APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN
KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L
TLOP = 1 the relative frequency of allele i in
the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be
expressed as 119867W = minussum 119901K|( ln 119901K|(N
KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[
T[OP UN
KOP and
119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP
Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities
0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP
When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L
TLOP ln119908L
Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have
minusQsum 119908LTLOP 119901K|LU lnQsum 119908L
TLOP 119901K|LU ge minussum 119908L119901K|L
TLOP ln119901K|L
Summing over all alleles we then obtain
minussum Qsum 119908LTLOP 119901K|LUN
KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L
TLOP sum 119901K|L ln 119901K|LN
KOP
This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows
119867W = minuscdc119908L119901K|L
T
LOP
lndc119908L119901K|[
T
[OP
eeN
KOP
le minuscdc119908L119901K|L ln119908L119901K|L
T
LOP
eN
KOP
= minussum 119908L sum 119901K|L ln 119901K|LNKOP
TLOP minus sum 119908L sum 119901K|L ln119908LN
KOPTLOP = 119867 minus sum 119908L ln119908L
TLOP
When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
4
From the above theorem the normalized differentiation measure (Shannon
differentiation) is formulated as Δg =
hijhkjsum lm nolm
pmqr
(C1)
This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties
that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared
between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added
to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes
(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community
Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy
Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as
119867W = minussum 119901K|(( ln 119901K|((NKOP 119867
(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP
tsOP
119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN
KOPTuLOP
tsOP
(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is
119867W = 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x (C2)
Here 119867(P) denotes the within-population information w119867
(S) minus 119867(P)x denotes the
among-population information within a region and w119867W minus 119867(S)x denotes the
among-region information In the following theorem the maximum value for each of the
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
5
latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities
0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st
sOP (C3) 0 le 119867
(S) minus 119867(P) le minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C4)
When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867
(P) When all populations are completely distinct (no shared alleles) we have 119867
(S) minus 119867(P) = minussum sum 119908Ls ln
lmulyu
TuLOP
tsOP and 119867W minus 119867
(S) = minussum 119908(s ln119908(stsOP
Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as
119901K|(( = sum sum 119908Ls119901K|LsTuLOP
tsOP = sum 119908(s119901K|(st
sOP Then the ldquogammardquo entropy for this two-level system is
119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut
sOP UNKOP
The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN
zOPtsOP which is 119867
(S) Eq (C3) then follows directly from Theorem C1
To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for
allele i in population j with population weights lrulyu
l|ulyu
⋯ lpuulyu
119895 = 12⋯ 119869s The
ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =
minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu
lyu119901K|Ls
TuLOP The corresponding ldquoalphardquo entropy is
sum lmulyu
119867Ls(P)Tu
LOP = minussum lmulyu
sum 119901K|LsNKOP ln 119901K|Ls
TuLOP Then Theorem C1 leads to
119867s(S) minusc
119908Ls119908(s
119867Ls(P)
Tu
LOP
= minusc119901K|(s ln 119901K|(s
N
KOP
+c119908Ls119908(s
c119901K|Ls
N
KOP
ln 119901K|Ls
Tu
LOP
le minusc119908Ls119908(s
ln119908Ls119908(s
Tu
LOP
Summing over k with weight 119908(sin both sides of the above inequality we obtain
minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP
tsOP + sum sum 119908Ls sum 119901K|LsN
KOP ln 119901K|LsTuLOP
tsOP = 119867
(S) minus 119867(P) le
minussum 119908Ls lnlmulyu
TuLOP
This proves Eq (C4) From the above theorem we have 0 le 119867
(P) le 119867(S) le 119867W ie the gamma diversity of
any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions
Δg(S) = hijhk
(|)
jsum lyu nolyuuqr
(C5)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
6
Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region
Δg(P) = hk
(|)jhk(r)
jsum sum lmu noQlmulyuUpumqr
uqr
(C6)
Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as
119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s
+ ln119908(sTu
LOP
t
sOP
eN
KOP
= minuscdcc119908Ls119901K|Ls ln 119901K|Ls
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908Ls119908(s
Tu
LOP
t
sOP
eN
KOP
minuscdcc119908Ls119901K|Ls ln119908(s
Tu
LOP
t
sOP
eN
KOP
= 119867(P) minuscc119908Ls ln
119908Ls119908(s
Tu
LOP
t
sOP
minusc119908(s ln119908(s
t
sOP
equiv 119867(P) + w119867
(S) minus 119867(P)x + w119867W minus 119867
(S)x
In this special case we have 119867(S) minus 119867
(P) = minussum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119867W minus 119867
(S) =minussum 119908(s ln119908(st
sOP
As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy
For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)
119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868
(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP
tsOP
119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls
KOPTuLOP
tsOP
Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy
119868W = 119868(P) + w119868
(S) minus 119868(P)x + w119868W minus 119868
(S)x (C7)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
7
Here 119868(P) denotes the within-population phylogenetic information w119868
(S) minus 119868(P)x denotes
the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes
the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T
0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st
sOP (C8) 0 le 119868
(S) minus 119868(P) le minus119879sum sum 119908Ls ln
lmulyu
TuLOP
tsOP (C9)
When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868
(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868
(P) = minus119879sum sum 119908Ls lnlmulyu
TuLOP
tsOP and 119868W minus 119868
(S) = minus119879sum 119908(s ln119908(stsOP
The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
1
SUPPLEMENTARYINFORMATION
FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon
entropies 119867lowast arecalculatedfromallelespeciesabundancesateach
levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1
Within Between Total Decomposition
3 Ecosystem minus minus
2 Region
1 Population or Community
pi|+1 pi|+2
pi|++
Ha1(2)
Ha2(2)
Ha11(1)
Ha21(1)
Ha31(1)
Ha42(1)
Ha52(1)
Ha(1)
Ha(2)Hg
pi|11 pi|31pi|21 pi|42 pi|52
= amp ( ) = +
= amp (
=
= amp (
) = + =
= ) )
= )
= )
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
2
FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes
representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall
otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +
119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi
p1 p3p2 p4 p5
p1 + p2
p4 + p5
p1 + p2 + p3
MRCA
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
3
RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities
SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach
populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry
YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity
(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea
three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis
displayedasthefollowing
Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin
rowsandpopulationsincolumns)
Ecosystem13
Region113
Population113 Population213
Region213
Population313 Population413 Population513
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
4
Pop1 Pop2 Pop3 Pop4 Pop5
Allele1 1 16 2 10 15
Allele2 0 0 0 5 14
Allele3 7 12 11 1 0
Allele4 0 5 14 1 21
Allele5 2 1 0 11 10
Allele6 0 1 3 2 0
Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof
anynumberoflevelscanbeexpressedinasimilarmanner
Pop1 Pop2 Pop3 Pop4 Pop5
Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem
Level2 Region1 Region1 Region2 Region2 Region2
Level1 Population1 Population2 Population3 Population4 Population5
FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)
areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151
4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))
RunRfunction iDIP(DataStruc)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
5
Output is given below
[1]
D_gamma 5272
D_alpha2 4679
D_alpha1 3540
D_beta2 1127
D_beta1 1322
Proportion2 0300
Proportion1 0700
Differentiation2 0204
Differentiation1 0310
We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the
ecosystem (total diversity) is 5272
(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele
equivalents
D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x
1127 = 5272 (=D_gamma)
(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains
3540 allele equivalents
D_beta1 =1322 is interpreted as that there are 132 population equivalents per
region
Here 1322 x 3540 = 4679 species per region (= D_alpha2)
(2) Proportion2 = 030 means that the proportion of total beta information found at
the regional level is 30
Proportion1 = 070 means that the proportion of total beta information found at
the population level is 70
(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among
regions is 0204 This can be interpreted as the following effective sense the mean
proportion of non-shared alleles in a region is around 204
Differentiation1 =0310 implies that the mean differentiationdissimilarity among
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
6
populations within a region is 031 ie the mean proportion of non-shared alleles
in a population is around 310
RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrix)output
(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1
measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc
NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity
MainfunctioniDIP
iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)
gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)
H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
7
Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)
for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))
for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]
elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))
wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))
Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
8
out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)
paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)
paste0(Differentiation(H-1)1) ) return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
9
PhylogeneticDiversity(RfunctioniDIPphylo)
Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)
(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas
thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora
community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven
aboveforthespeciesdiversity
(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy
Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe
speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow
InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))
rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))
Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))
RunRfunction iDIPphylo(DataStrucTree)
Output is given below
[1]
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
10
Faiths PD 321525
mean_T 94169
PD_gamma 274388
PD_alpha2 255194
PD_alpha1 223231
PD_beta2 1075
PD_beta1 1143
PD_prop2 0351
PD_prop1 0649
PD_diff2 0124
PD_diff1 0149
Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions
(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode
toeachofthetipsis94169
(3a)PD_gamma=274388 is interpreted as that the effective total branch length in
the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per
region is 255194
PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)
(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per
population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region
Here223231 x1143=255194(=PD_alpha2)
(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351
PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649
(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong
regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
11
PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared
lineagesinacommunityisaround149
RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata
abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity
hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy
NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast
onespeciesalleleinapopulationoracommunityoutput
(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree
meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel
(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample
PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc
Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
12
installpackages(ade4)library(ade4)
installpackages(ape)library(ape)installpackages(phytools)
library(phytools)MainfunctioniDIPphylo
iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)
Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))
M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))
for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl
ist(phyloData$paths[i])))
pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M
pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)
ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))
PD=sum(pB[gpgt0])
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
13
H=nrow(struc)
A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)
wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))
Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)
if(Hgt2) for(iin2(H-1))
I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)
II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])
pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)
W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))
A[i-1]=sum(wiAi)
total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total
B[1]=exp(G)exp(A[1]) if(Hgt2)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)
14
for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])
Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])
Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop
out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)
rownames(out)lt-c(paste(FaithsPD) paste(mean_T)
paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)
paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )
return(out)