39
Evolutionary Applications. 2018;1–18. | 1 wileyonlinelibrary.com/journal/eva Received: 31 August 2017 | Accepted: 21 December 2017 DOI: 10.1111/eva.12593 SPECIAL ISSUE ORIGINAL ARTICLE Diversity from genes to ecosystems: A unifying framework to study variation across biological metrics and scales Oscar E. Gaggiotti 1 | Anne Chao 2 | Pedro Peres-Neto 3 | Chun-Huo Chiu 4 | Christine Edwards 5 | Marie-Josée Fortin 6 | Lou Jost 7 | Christopher M. Richards 8 | Kimberly A. Selkoe 9,10 1 School of Biology, Scottish Oceans Institute, University of St Andrews, St Andrews, UK 2 Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan 3 Department of Biology, Concordia University, Montreal, QC, Canada 4 Department of Agronomy, National Taiwan University, Taipei, Taiwan 5 Center for Conservation and Sustainable Development, Missouri Botanical Garden, Saint Louis, MO, USA 6 Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada 7 Ecominga Fundation, Banos, Tungurahua, Ecuador 8 Plant Germplasm Preservation Research Unit, USDA-ARS, Fort Collins, CO, USA 9 National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, CA, USA 10 Hawai’i Institute of Marine Biology, University of Hawai’i at Mānoa, Kaneohe, HI, USA This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2018 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd Correspondence Oscar E. Gaggiotti, School of Biology, Scottish Oceans Institute, University of St Andrews, St Andrews, UK. Email: [email protected] Funding information US National Natural Science Foundation (BioOCE Award), Grant/Award Number: 1260169; The Marine Alliance for Science and Technology for Scotland (Scottish Funding Council), Grant/Award Number: HR09011; National Science Foundation, Grant/Award Number: DBI-1300426; The University of Tennessee; NOAA Coral Reef Conservation Program; the Ministry of Science and Technology, Taiwan; Canada Research Chair in Spatial Modelling and Biodiversity Abstract Biological diversity is a key concept in the life sciences and plays a fundamental role in many ecological and evolutionary processes. Although biodiversity is inherently a hier- archical concept covering different levels of organization (genes, population, species, ecological communities and ecosystems), a diversity index that behaves consistently across these different levels has so far been lacking, hindering the development of truly integrative biodiversity studies. To fill this important knowledge gap, we present a unify- ing framework for the measurement of biodiversity across hierarchical levels of organi- zation. Our weighted, information-based decomposition framework is based on a Hill number of order q = 1, which weights all elements in proportion to their frequency and leads to diversity measures based on Shannon’s entropy. We investigated the numerical behaviour of our approach with simulations and showed that it can accurately describe complex spatial hierarchical structures. To demonstrate the intuitive and straightfor- ward interpretation of our diversity measures in terms of effective number of compo- nents (alleles, species, etc.), we applied the framework to a real data set on coral reef biodiversity. We expect our framework will have multiple applications covering the fields of conservation biology, community genetics and eco-evolutionary dynamics. KEYWORDS biodiversity indices, genetic diversity, hierarchical spatial structure, Hill numbers, species diversity

Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

Evolutionary Applications 20181ndash18 emsp|emsp1wileyonlinelibrarycomjournaleva

Received31August2017emsp |emsp Accepted21December2017DOI101111eva12593

S P E C I A L I S S U E O R I G I N A L A R T I C L E

Diversity from genes to ecosystems A unifying framework to study variation across biological metrics and scales

Oscar E Gaggiotti1 emsp|emspAnne Chao2emsp|emspPedro Peres-Neto3emsp|emspChun-Huo Chiu4emsp|emsp Christine Edwards5 emsp|emspMarie-Joseacutee Fortin6emsp|emspLou Jost7emsp|emspChristopher M Richards8emsp|emsp Kimberly A Selkoe910

1SchoolofBiologyScottishOceansInstituteUniversityofStAndrewsStAndrewsUK2InstituteofStatisticsNationalTsingHuaUniversityHsin-ChuTaiwan3DepartmentofBiologyConcordiaUniversityMontrealQCCanada4DepartmentofAgronomyNationalTaiwanUniversityTaipeiTaiwan5CenterforConservationandSustainableDevelopmentMissouriBotanicalGardenSaintLouisMOUSA6DepartmentofEcologyandEvolutionaryBiologyUniversityofTorontoTorontoONCanada7EcomingaFundationBanosTungurahuaEcuador8PlantGermplasmPreservationResearchUnitUSDA-ARSFortCollinsCOUSA9NationalCenterforEcologicalAnalysisandSynthesisUniversityofCaliforniaSantaBarbaraSantaBarbaraCAUSA10HawairsquoiInstituteofMarineBiologyUniversityofHawairsquoiatMānoaKaneoheHIUSA

ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicensewhichpermitsusedistributionandreproductioninanymediumprovidedtheoriginalworkisproperlycitedcopy2018TheAuthorsEvolutionary ApplicationspublishedbyJohnWileyampSonsLtd

CorrespondenceOscarEGaggiottiSchoolofBiologyScottishOceansInstituteUniversityofStAndrewsStAndrewsUKEmailoegst-andrewsacuk

Funding informationUSNationalNaturalScienceFoundation(BioOCEAward)GrantAwardNumber1260169TheMarineAllianceforScienceandTechnologyforScotland(ScottishFundingCouncil)GrantAwardNumberHR09011NationalScienceFoundationGrantAwardNumberDBI-1300426TheUniversityofTennesseeNOAACoralReefConservationProgramtheMinistryofScienceandTechnologyTaiwanCanadaResearchChairinSpatialModellingandBiodiversity

AbstractBiologicaldiversityisakeyconceptinthelifesciencesandplaysafundamentalroleinmanyecologicalandevolutionaryprocessesAlthoughbiodiversityisinherentlyahier-archical conceptcoveringdifferent levelsoforganization (genespopulation speciesecological communities and ecosystems) a diversity index that behaves consistentlyacrossthesedifferentlevelshassofarbeenlackinghinderingthedevelopmentoftrulyintegrativebiodiversitystudiesTofillthisimportantknowledgegapwepresentaunify-ingframeworkforthemeasurementofbiodiversityacrosshierarchicallevelsoforgani-zationOurweightedinformation-baseddecompositionframeworkisbasedonaHillnumber of order q=1whichweightsallelementsinproportiontotheirfrequencyandleadstodiversitymeasuresbasedonShannonrsquosentropyWeinvestigatedthenumericalbehaviourofourapproachwithsimulationsandshowedthatitcanaccuratelydescribecomplex spatial hierarchical structuresTodemonstrate the intuitive and straightfor-wardinterpretationofourdiversitymeasuresintermsofeffectivenumberofcompo-nents(allelesspeciesetc)weappliedtheframeworktoarealdatasetoncoralreefbiodiversityWe expect our frameworkwill havemultiple applications covering thefieldsofconservationbiologycommunitygeneticsandeco-evolutionarydynamics

K E Y W O R D S

biodiversityindicesgeneticdiversityhierarchicalspatialstructureHillnumbersspeciesdiversity

2emsp |emsp emspensp GAGGIOTTI eT Al

1emsp |emspINTRODUCTION

Biologicaldiversityisafoundationalconceptinthelifesciencesandcritical to strategies forecological conservationHowever formanydecades biodiversity has been treated in a piecemealmannerwithecologists focusing on species diversity (but more recently also ontrait andphylogeneticdiversity) andpopulationgeneticists focusingongeneticdiversityThisdichotomyhasledtolargedifferencesinthetypeofdiversityindicesthathavebeenusedtomeasurespeciestraitphylogeneticandgeneticdiversityEcologistswereinitiallyfocusedonempiricaldevelopmentsandgeneratedaverylargenumberofspeciesdiversityindicesthatstronglydifferintheirnumericalbehaviour(Jost2006)andestimationproperties (BungeWillisampWalsh2014)Ontheotherhandpopulationgeneticswasinitiallydominatedbytheo-reticaldevelopmentsandmathematicalmodelsfocusedonaspecificsetofparametersthatdescribedgeneticdiversitywithinandamongpopulationswhichledtothedevelopmentofarestrictedsetofge-neticdiversity indicesThusalthoughbiodiversity is inherentlyahi-erarchical concept coveringdifferent levelsoforganization (geneticpopulationspeciesecologicalcommunitiesandecosystems)thelackof diversity indices that behave consistently across these different levelshasprecludedthedevelopmentoftrulyintegrativebiodiversitystudies

Recentlymotivatedby this lackofcommonmeasures forbiodi-versityatdifferentlevelsofbiologicalorganizationpopulationgenet-icistshavecarriedoutmethodologicaldevelopmentsthatextendtheuseofpopularspeciesdiversity indicestothemeasurementgeneticdiversityatdifferentlevelsofspatialsubdivision[egShannonrsquosandSimpsonrsquos indices (SherwinJabotRushampRossetto2006SmouseWhiteheadampPeakall2015)]However simplyadapting speciesdi-versitymeasuresisnotsufficientfortworeasonsFirstthereismuchcontroversyoverhowtoquantifyabundance-basedspeciesdiversityinacommunity(MendesEvangelistaThomazAgostinhoampGomes2008)Secondtherehasbeenlittleagreementonhowtopartitiondi-versityintoitsspatialcomponents(Ellison2010)ApromisingsolutionforaunifiedmeasureofgeneticdiversitycentresonHillnumbers(Hill1973)IndeedaconsensusisemergingontheuseofHillnumbersasaunifyingconcepttodefinemeasuresofvarioustypesofdiversityin-cludingspeciesphylogeneticandfunctionaldiversities(ChaoChiuampJost2014)ImportantlyHillnumbersfollowthereplicationprincipleensuringthatdiversitymeasuresarelinearinrelationtogrouppool-ingAssuch theycanbeused todevelopproperpartitionschemesacrossspatialscalesorotherhierarchicalstructuressuchaspopula-tionswithinmetapopulationsspecieswithinphylogeniescommuni-tieswithinecosystemsandtopoolinformationacrossdifferentlevelsin a hierarchy

Thepurposeof this studywas topresent a unifying frameworkfor the measurement of biodiversity across hierarchical levels of or-ganizationfromlocalpopulationtoecosystemlevelsWeexpectthatthisnewframeworkwillbeauseful tool forconservationbiologistsandwillalsofacilitatethedevelopmentofthefieldsofcommunityge-netics(Agrawal2003)andeco-evolutionarydynamics(Hendry2013)Thisnewframeworkmayalsofacilitatebridgingcommunityecology

processes(selectionamongspeciesdriftdispersalandspeciation)andthe processes emphasized by population genetics theory (selectionwithinspeciesdriftgeneflowandmutation)asexploredbyVellendetal(2014)Thepaperstartsbyoutlininghistoricaldevelopmentsonthe formulation and use of biodiversity measures in the fields of ecol-ogyandpopulationgenetics(Section2)WethenprovideanoverviewoftheuseofHillnumbersinecologyandtheirrelationshipwithpopu-lationgeneticparameterssuchasNe(Section3)Section4presentsaweightedinformation-baseddecompositionframeworkthatprovidesmeasuresofbothgeneticandspeciesdiversityatallhierarchicallevelsofspatialsubdivisionfrompopulationstoecosystemsThisisfollowedbythedescriptionofsoftwarethatimplementstheapproach(Section5)Section6explorespatternsofspeciesandgeneticdiversityunderdifferentspatialsubdivisionmodelsusingsimulateddatawithknowndiversityhierarchicalstructuresSection7showsanapplicationtoarealdatasetoncoralreefbiodiversity(Selkoeetal2016)Weclosewithadiscussionoftheadvantagesand limitationsofourapproachanditsapplicationsinthefieldsofconservationbiologycommunitygeneticsandeco-evolutionarydynamics

2emsp |emspHISTORICAL DEVELOPMENTS

Arguably the ultimate reason for methodological divergence in diver-sityindicesusedbypopulationgeneticistsandcommunityecologistsresidesintheverydifferentcontextsthat leadtotheemergenceofthesetwodisciplinesEcologistswereinterestedinunderstandingtheprocessesthatdeterminethestructureandcompositionofcommuni-tiesandcoulddirectlymeasurethecommunitytraits(numberofspe-ciesandtheirabundances)neededtocomparedifferentcommunitiesThisrelativelyeasyaccesstorealdataandaninitiallylimitedinterestinmechanisticmodelsfosteredthedevelopmentofalargenumberofdiversitymeasures toexplorespeciesdistributionaldata (Magurran2004) and eventually made the quantification of abundance-basedspecies diversity one of the most controversial issues in ecologyPopulationgeneticsontheotherhandaroseinresponsetoaneedtoreconciletwoopposingviewsofevolutionthathingedonthetypeofdiversityuponwhichnaturalselectionactedDarwinproposedthatitwassmallcontinuousvariationwhileGaltonbelievedthatnaturalselection acted upon large discontinuous variation (Provine 1971)Variation in thiscasewasanabstractconceptandcouldnotbedi-rectlymeasuredwhichmotivatedthedevelopmentofavastbodyoftheory centred around mathematical models describing the behaviour ofarestrictedsetofdiversitymeasures(Provine1971)

Althoughecologistsandpopulationgeneticistsuseverydifferentapproachestomeasurediversitytheyarebothinterestedindescrib-ing spatial patterns by decomposing total diversity intowithin- andamong-communitypopulation components But here again meth-odological developmentsdiffer greatlybetween the twodisciplinesEcologists engaged in intensedebateson the choiceofpartitioningschemes (Jost 2007)while population geneticists remained largelyfaithful to the use of so-called fixation indices proposed byWright(1951) Nevertheless the recently established fields of molecular

emspensp emsp | emsp3GAGGIOTTI eT Al

ecologycommunitygeneticsandeco-evolutionarydynamicsarehelp-ing to foster a convergence between the methods used to measure speciesandgeneticdiversity Indeed in the lastdecadepopulationgeneticistshavebeguntoextendtheuseofpopularspeciesdiversitymetrics to the measurement of genetic diversity by deriving mathe-matical expressions linking themwithevolutionaryparameters suchaseffectivepopulationsizeandmutationandmigrationrates (Chaoetal2015Sherwin2010Sherwinetal2006Smouseetal2015)

Regardless of this very recent methodological convergence ecolo-gistsandpopulationgeneticistsfacethesamechallengeswhentryingtocharacterizehowdiversitycomponents(alphabeta)arestructuredgeographicallyTheseproblemshavebeendescribedingreatdetailinthe literature (eg seeJost 2007 2010) so herewewill only giveaverybrief summaryThe first problem is that the commonlyusedwithin-community andwithin-population abundance diversity mea-sures (eg Shannon-Wiener index and heterozygosity) are in factentropiesmeaningthattheyquantifytheuncertainty inthespeciesor allele identity of randomly sampled individuals or alleles respec-tivelyImportantlytheseindicesdonotscalelinearlywithanincreaseindiversityandsomeofthem(egheterozygosity)reachanasymp-toteforlargevaluesThesecondproblemisthattheldquowithin-rdquo(alpha)andldquobetween-rdquo (beta)componentsofdiversityarenot independentIntuitively ifbetadependsonalpha itwouldbeimpossibletocom-parebetadiversitiesacrossalllevelsatwhichalphadiversitiesdiffer

Partitioning components of diversity is central to progress onthese problems Ecologists have related the traditional alpha betaandgammadiversityusingbothadditiveandmultiplicativeschemesofpartitioningOntheotherhandpopulationgeneticistshavealwaysusedthemultiplicativeschemebasedonthepartitioningoftheprob-abilityofidentitybydescentofpairsofalleles(inbreedingcoefficientsF)Althoughtherehasbeensomeconfusion(cfJost2008Jostetal2010MeirmansampHedrick2011) it iseasytodemonstratethatallestimators of FST a parameter that quantifies genetic structure in-cluding GST (Nei1973) andθ (WeirampCockerham1984) arebasedon thewell-knownmultiplicative decomposition ofWrightrsquos (1951)F-statistics (1minusFIT)= (1minusFIS)(1minusFST) where all terms are entropymeasuresdescribingtheuncertaintyintheidentitybydescentofpairsofalleleswhentheyaresampledfromthewholesetofpopulations(metapopulation)(1minusFIT)fromwithinthesamepopulation(1minusFIS) or fromtwodifferentpopulations(1minusFST)

As mentioned earlier ecologists engaged in intense debates onhow topartition speciesdiversitybut ina recentEcology forum(Ellison 2010) contributors agreed that a first step towards reach-ing a consensus was to adopt Hill numbers to measure diversityDiscussionsamongpopulationgeneticistsarelessadvancedbecauseof their traditional focus on the use of genetic polymorphism datato estimate important evolutionary parameterswhich requires thatgenetic diversity statistics be effective measures of the causes and consequencesofgeneticdifferentiation(egWhitlock2011)MuchtheoreticalworkisstillneededtodemonstratethatdiversitymeasuresbasedoninformationtheorydosatisfythisrequirementHereinsteadwearguethattheadoptionofHillnumbersinpopulationgeneticsisalsoagoodstartingpointtoreachaconsensusonhowtopartition

geneticdiversityInwhatfollowswefirstintroduceHillnumbersandthenpresentaweightedinformation-baseddecompositionframeworkapplicabletobothcommunityandpopulationgeneticsstudies

3emsp |emspOVERVIEW OF HILL NUMBERS

TherearenowmanyarticlesdescribingtheapplicationofHillnum-bers Here we follow Jost (2006) who reintroduced their use inecologyAsJost(2006)notedmostdiversityindicesareinfacten-tropiesthatmeasuretheuncertainty inthe identityofspecies (oralleles) inasampleHowever truediversitymeasuresshouldpro-videestimatesofthenumberofdistinctelements(speciesoralleles)in an aggregate (communityorpopulation) Toderive suchmeas-ureswefirstnotethatdiversityindicescreateequivalenceclassesamong aggregates in the sense that all aggregates with the same diversityindexvaluecanbeconsideredasequivalentForexampleallpopulationswith thesameheterozygosityvalueareequivalentin termsof this indexeven if theyhave radicallydifferentallelesfrequencies (seeAppendixS1 for an example)Moreover for anygivenheterozygosity therewill be an ldquoidealrdquo population inwhichallallelesareequallyfrequentItisthereforepossibletodefineanldquoeffectivenumberofelementsrdquo(allelesinthisexample)asthenum-ber of equally frequent elements in an ldquoideal aggregaterdquo that hasthesamediversityindexvalueastheldquorealaggregaterdquoAnexampleofeffectivenumber inanecologicalcontext istheeffectivenum-berofspeciesintroducedbyMacarthur(1965)whileanequivalentconcept in population genetics is the effective number of alleles(KimuraampCrow1964)

NotethattheconceptofeffectivepopulationsizeNeusedinpop-ulationgeneticsisanalogoustothatofHillnumbersbutisbasedonaratherdifferentconceptMorepreciselyNe is defined as the number ofindividualsinanideal(WrightndashFisher)populationthathasthesamemagnitudeofrandomgeneticdriftastherealpopulationbeingstud-iedTherearedifferentwaysinwhichwecanmeasurethestrengthofgenetic drift the most common being change in average inbreeding coefficientchangeinallelefrequencyvarianceandrateoflossofhet-erozygosityandeachleadtoadifferenttypeofeffectivesizeThustheidealandtherealpopulationsareequivalentintermsoftherateoflossofgeneticdiversityandnotintermsofequalrepresentationofdistinct individuals Probably the only similarity between Ne and the rationaleunderlyingHillnumbersisinthesensethatalltheindividualsintheidealpopulationcontributeequally(onaverage)tothegenepoolofthenextgeneration

Theapplicationoftheabove-statedlogictoanyofthemanydiffer-ententropymeasuresusedinecologyandpopulationgeneticsyieldsasingleexpressionfordiversity

where Sdenotesthenumberofspeciesorallelespi denotes the rel-ativeabundanceorfrequencyofspeciesoralleleiandtheexponent

(1)qDequiv

(

sum

S

i=1pqi

)1∕(1minusq)

4emsp |emsp emspensp GAGGIOTTI eT Al

andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)

where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles

It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition

4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)

Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused

41emsp|emspFormulation in terms of abundance diversity

Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany

number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance

Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities

FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)

α D(2)α andDγThesedi-

versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)

Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk

bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=

sum2

n=0nNinjk

and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=

sumS

i=1Nijk the total number of alleles in region k as N++k=

sumJk

j=1N+jk

and the total number of alleles in the ecosystem as N+++ =sumK

k=1N++k

All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the

(2)1D=exp

(

minussumS

i=1pi ln pi

)

=exp (H)

(3)2D=1∕

(

sumS

i=1p2i

)

emspensp emsp | emsp5GAGGIOTTI eT Al

weight for region k thus becomes w+k=sumJk

j=1wjk=N++k∕N+++Table2

describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions

Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)

α D(2)α andDγ all other diversity measures can be derived

fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)

αk for each

individual region k and then obtain the weighted average over all regions H(2)

α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)

α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)

α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions

The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma

alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)

42emsp|emspFormulation in terms of phylogenetic diversity

Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system

421emsp|emspPhylogenetic diversity measures in a single aggregate

Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime

F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system

6emsp |emsp emspensp GAGGIOTTI eT Al

referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)

LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=

sumB

i=1LiaiForanultrametrictree themeanbranch length

issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)

Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas

This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=

sumB

i=1Liwhich isthewell-knownFaithrsquosPDthesumof

the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between

anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)

HereIdenotesthephylogeneticentropy

whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)

422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system

The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate

Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare

(4)qPD=

sumB

i=1Li

(

ai

T

)q1∕(1minusq)

(5)1PD= lim qrarr1

qPD=exp

[

minussumB

i=1Liai

Tln

(

ai

T

)]

equivT exp (I∕T)

(6)I=minussumB

i=1Liai ln ai

TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus

Hierarchical level

Diversity

DecompositionWithin Between Total

3Ecosystem minus minus Dγ Dγ =D(1)α D

(1)

βD(2)

β

2 Region D(2)α D

(2)

β=D

(2)γ ∕D

(2)α D

(2)γ =Dγ D

γ=D

(2)α D

(2)

β

1Communityorpopulation D(1)α D

(1)β

=D(1)γ ∕D

(1)α D

(1)γ =D

(2)α D

(2)α = D

(1)α D

(1)β

TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure

Hierarchical level Speciesallele relative frequency

Population pijk=Nijk∕N+jk=Nijk∕sumS

i=1Nijk

Region pi+k= Ni+k∕N++k=sumJk

j=1(wjk∕w+k)pijk

Ecosystem pi++ = Ni++∕N+++ =sumK

k=1

sumJk

j=1wjkpijk

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 2: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

2emsp |emsp emspensp GAGGIOTTI eT Al

1emsp |emspINTRODUCTION

Biologicaldiversityisafoundationalconceptinthelifesciencesandcritical to strategies forecological conservationHowever formanydecades biodiversity has been treated in a piecemealmannerwithecologists focusing on species diversity (but more recently also ontrait andphylogeneticdiversity) andpopulationgeneticists focusingongeneticdiversityThisdichotomyhasledtolargedifferencesinthetypeofdiversityindicesthathavebeenusedtomeasurespeciestraitphylogeneticandgeneticdiversityEcologistswereinitiallyfocusedonempiricaldevelopmentsandgeneratedaverylargenumberofspeciesdiversityindicesthatstronglydifferintheirnumericalbehaviour(Jost2006)andestimationproperties (BungeWillisampWalsh2014)Ontheotherhandpopulationgeneticswasinitiallydominatedbytheo-reticaldevelopmentsandmathematicalmodelsfocusedonaspecificsetofparametersthatdescribedgeneticdiversitywithinandamongpopulationswhichledtothedevelopmentofarestrictedsetofge-neticdiversity indicesThusalthoughbiodiversity is inherentlyahi-erarchical concept coveringdifferent levelsoforganization (geneticpopulationspeciesecologicalcommunitiesandecosystems)thelackof diversity indices that behave consistently across these different levelshasprecludedthedevelopmentoftrulyintegrativebiodiversitystudies

Recentlymotivatedby this lackofcommonmeasures forbiodi-versityatdifferentlevelsofbiologicalorganizationpopulationgenet-icistshavecarriedoutmethodologicaldevelopmentsthatextendtheuseofpopularspeciesdiversity indicestothemeasurementgeneticdiversityatdifferentlevelsofspatialsubdivision[egShannonrsquosandSimpsonrsquos indices (SherwinJabotRushampRossetto2006SmouseWhiteheadampPeakall2015)]However simplyadapting speciesdi-versitymeasuresisnotsufficientfortworeasonsFirstthereismuchcontroversyoverhowtoquantifyabundance-basedspeciesdiversityinacommunity(MendesEvangelistaThomazAgostinhoampGomes2008)Secondtherehasbeenlittleagreementonhowtopartitiondi-versityintoitsspatialcomponents(Ellison2010)ApromisingsolutionforaunifiedmeasureofgeneticdiversitycentresonHillnumbers(Hill1973)IndeedaconsensusisemergingontheuseofHillnumbersasaunifyingconcepttodefinemeasuresofvarioustypesofdiversityin-cludingspeciesphylogeneticandfunctionaldiversities(ChaoChiuampJost2014)ImportantlyHillnumbersfollowthereplicationprincipleensuringthatdiversitymeasuresarelinearinrelationtogrouppool-ingAssuch theycanbeused todevelopproperpartitionschemesacrossspatialscalesorotherhierarchicalstructuressuchaspopula-tionswithinmetapopulationsspecieswithinphylogeniescommuni-tieswithinecosystemsandtopoolinformationacrossdifferentlevelsin a hierarchy

Thepurposeof this studywas topresent a unifying frameworkfor the measurement of biodiversity across hierarchical levels of or-ganizationfromlocalpopulationtoecosystemlevelsWeexpectthatthisnewframeworkwillbeauseful tool forconservationbiologistsandwillalsofacilitatethedevelopmentofthefieldsofcommunityge-netics(Agrawal2003)andeco-evolutionarydynamics(Hendry2013)Thisnewframeworkmayalsofacilitatebridgingcommunityecology

processes(selectionamongspeciesdriftdispersalandspeciation)andthe processes emphasized by population genetics theory (selectionwithinspeciesdriftgeneflowandmutation)asexploredbyVellendetal(2014)Thepaperstartsbyoutlininghistoricaldevelopmentsonthe formulation and use of biodiversity measures in the fields of ecol-ogyandpopulationgenetics(Section2)WethenprovideanoverviewoftheuseofHillnumbersinecologyandtheirrelationshipwithpopu-lationgeneticparameterssuchasNe(Section3)Section4presentsaweightedinformation-baseddecompositionframeworkthatprovidesmeasuresofbothgeneticandspeciesdiversityatallhierarchicallevelsofspatialsubdivisionfrompopulationstoecosystemsThisisfollowedbythedescriptionofsoftwarethatimplementstheapproach(Section5)Section6explorespatternsofspeciesandgeneticdiversityunderdifferentspatialsubdivisionmodelsusingsimulateddatawithknowndiversityhierarchicalstructuresSection7showsanapplicationtoarealdatasetoncoralreefbiodiversity(Selkoeetal2016)Weclosewithadiscussionoftheadvantagesand limitationsofourapproachanditsapplicationsinthefieldsofconservationbiologycommunitygeneticsandeco-evolutionarydynamics

2emsp |emspHISTORICAL DEVELOPMENTS

Arguably the ultimate reason for methodological divergence in diver-sityindicesusedbypopulationgeneticistsandcommunityecologistsresidesintheverydifferentcontextsthat leadtotheemergenceofthesetwodisciplinesEcologistswereinterestedinunderstandingtheprocessesthatdeterminethestructureandcompositionofcommuni-tiesandcoulddirectlymeasurethecommunitytraits(numberofspe-ciesandtheirabundances)neededtocomparedifferentcommunitiesThisrelativelyeasyaccesstorealdataandaninitiallylimitedinterestinmechanisticmodelsfosteredthedevelopmentofalargenumberofdiversitymeasures toexplorespeciesdistributionaldata (Magurran2004) and eventually made the quantification of abundance-basedspecies diversity one of the most controversial issues in ecologyPopulationgeneticsontheotherhandaroseinresponsetoaneedtoreconciletwoopposingviewsofevolutionthathingedonthetypeofdiversityuponwhichnaturalselectionactedDarwinproposedthatitwassmallcontinuousvariationwhileGaltonbelievedthatnaturalselection acted upon large discontinuous variation (Provine 1971)Variation in thiscasewasanabstractconceptandcouldnotbedi-rectlymeasuredwhichmotivatedthedevelopmentofavastbodyoftheory centred around mathematical models describing the behaviour ofarestrictedsetofdiversitymeasures(Provine1971)

Althoughecologistsandpopulationgeneticistsuseverydifferentapproachestomeasurediversitytheyarebothinterestedindescrib-ing spatial patterns by decomposing total diversity intowithin- andamong-communitypopulation components But here again meth-odological developmentsdiffer greatlybetween the twodisciplinesEcologists engaged in intensedebateson the choiceofpartitioningschemes (Jost 2007)while population geneticists remained largelyfaithful to the use of so-called fixation indices proposed byWright(1951) Nevertheless the recently established fields of molecular

emspensp emsp | emsp3GAGGIOTTI eT Al

ecologycommunitygeneticsandeco-evolutionarydynamicsarehelp-ing to foster a convergence between the methods used to measure speciesandgeneticdiversity Indeed in the lastdecadepopulationgeneticistshavebeguntoextendtheuseofpopularspeciesdiversitymetrics to the measurement of genetic diversity by deriving mathe-matical expressions linking themwithevolutionaryparameters suchaseffectivepopulationsizeandmutationandmigrationrates (Chaoetal2015Sherwin2010Sherwinetal2006Smouseetal2015)

Regardless of this very recent methodological convergence ecolo-gistsandpopulationgeneticistsfacethesamechallengeswhentryingtocharacterizehowdiversitycomponents(alphabeta)arestructuredgeographicallyTheseproblemshavebeendescribedingreatdetailinthe literature (eg seeJost 2007 2010) so herewewill only giveaverybrief summaryThe first problem is that the commonlyusedwithin-community andwithin-population abundance diversity mea-sures (eg Shannon-Wiener index and heterozygosity) are in factentropiesmeaningthattheyquantifytheuncertainty inthespeciesor allele identity of randomly sampled individuals or alleles respec-tivelyImportantlytheseindicesdonotscalelinearlywithanincreaseindiversityandsomeofthem(egheterozygosity)reachanasymp-toteforlargevaluesThesecondproblemisthattheldquowithin-rdquo(alpha)andldquobetween-rdquo (beta)componentsofdiversityarenot independentIntuitively ifbetadependsonalpha itwouldbeimpossibletocom-parebetadiversitiesacrossalllevelsatwhichalphadiversitiesdiffer

Partitioning components of diversity is central to progress onthese problems Ecologists have related the traditional alpha betaandgammadiversityusingbothadditiveandmultiplicativeschemesofpartitioningOntheotherhandpopulationgeneticistshavealwaysusedthemultiplicativeschemebasedonthepartitioningoftheprob-abilityofidentitybydescentofpairsofalleles(inbreedingcoefficientsF)Althoughtherehasbeensomeconfusion(cfJost2008Jostetal2010MeirmansampHedrick2011) it iseasytodemonstratethatallestimators of FST a parameter that quantifies genetic structure in-cluding GST (Nei1973) andθ (WeirampCockerham1984) arebasedon thewell-knownmultiplicative decomposition ofWrightrsquos (1951)F-statistics (1minusFIT)= (1minusFIS)(1minusFST) where all terms are entropymeasuresdescribingtheuncertaintyintheidentitybydescentofpairsofalleleswhentheyaresampledfromthewholesetofpopulations(metapopulation)(1minusFIT)fromwithinthesamepopulation(1minusFIS) or fromtwodifferentpopulations(1minusFST)

As mentioned earlier ecologists engaged in intense debates onhow topartition speciesdiversitybut ina recentEcology forum(Ellison 2010) contributors agreed that a first step towards reach-ing a consensus was to adopt Hill numbers to measure diversityDiscussionsamongpopulationgeneticistsarelessadvancedbecauseof their traditional focus on the use of genetic polymorphism datato estimate important evolutionary parameterswhich requires thatgenetic diversity statistics be effective measures of the causes and consequencesofgeneticdifferentiation(egWhitlock2011)MuchtheoreticalworkisstillneededtodemonstratethatdiversitymeasuresbasedoninformationtheorydosatisfythisrequirementHereinsteadwearguethattheadoptionofHillnumbersinpopulationgeneticsisalsoagoodstartingpointtoreachaconsensusonhowtopartition

geneticdiversityInwhatfollowswefirstintroduceHillnumbersandthenpresentaweightedinformation-baseddecompositionframeworkapplicabletobothcommunityandpopulationgeneticsstudies

3emsp |emspOVERVIEW OF HILL NUMBERS

TherearenowmanyarticlesdescribingtheapplicationofHillnum-bers Here we follow Jost (2006) who reintroduced their use inecologyAsJost(2006)notedmostdiversityindicesareinfacten-tropiesthatmeasuretheuncertainty inthe identityofspecies (oralleles) inasampleHowever truediversitymeasuresshouldpro-videestimatesofthenumberofdistinctelements(speciesoralleles)in an aggregate (communityorpopulation) Toderive suchmeas-ureswefirstnotethatdiversityindicescreateequivalenceclassesamong aggregates in the sense that all aggregates with the same diversityindexvaluecanbeconsideredasequivalentForexampleallpopulationswith thesameheterozygosityvalueareequivalentin termsof this indexeven if theyhave radicallydifferentallelesfrequencies (seeAppendixS1 for an example)Moreover for anygivenheterozygosity therewill be an ldquoidealrdquo population inwhichallallelesareequallyfrequentItisthereforepossibletodefineanldquoeffectivenumberofelementsrdquo(allelesinthisexample)asthenum-ber of equally frequent elements in an ldquoideal aggregaterdquo that hasthesamediversityindexvalueastheldquorealaggregaterdquoAnexampleofeffectivenumber inanecologicalcontext istheeffectivenum-berofspeciesintroducedbyMacarthur(1965)whileanequivalentconcept in population genetics is the effective number of alleles(KimuraampCrow1964)

NotethattheconceptofeffectivepopulationsizeNeusedinpop-ulationgeneticsisanalogoustothatofHillnumbersbutisbasedonaratherdifferentconceptMorepreciselyNe is defined as the number ofindividualsinanideal(WrightndashFisher)populationthathasthesamemagnitudeofrandomgeneticdriftastherealpopulationbeingstud-iedTherearedifferentwaysinwhichwecanmeasurethestrengthofgenetic drift the most common being change in average inbreeding coefficientchangeinallelefrequencyvarianceandrateoflossofhet-erozygosityandeachleadtoadifferenttypeofeffectivesizeThustheidealandtherealpopulationsareequivalentintermsoftherateoflossofgeneticdiversityandnotintermsofequalrepresentationofdistinct individuals Probably the only similarity between Ne and the rationaleunderlyingHillnumbersisinthesensethatalltheindividualsintheidealpopulationcontributeequally(onaverage)tothegenepoolofthenextgeneration

Theapplicationoftheabove-statedlogictoanyofthemanydiffer-ententropymeasuresusedinecologyandpopulationgeneticsyieldsasingleexpressionfordiversity

where Sdenotesthenumberofspeciesorallelespi denotes the rel-ativeabundanceorfrequencyofspeciesoralleleiandtheexponent

(1)qDequiv

(

sum

S

i=1pqi

)1∕(1minusq)

4emsp |emsp emspensp GAGGIOTTI eT Al

andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)

where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles

It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition

4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)

Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused

41emsp|emspFormulation in terms of abundance diversity

Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany

number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance

Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities

FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)

α D(2)α andDγThesedi-

versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)

Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk

bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=

sum2

n=0nNinjk

and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=

sumS

i=1Nijk the total number of alleles in region k as N++k=

sumJk

j=1N+jk

and the total number of alleles in the ecosystem as N+++ =sumK

k=1N++k

All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the

(2)1D=exp

(

minussumS

i=1pi ln pi

)

=exp (H)

(3)2D=1∕

(

sumS

i=1p2i

)

emspensp emsp | emsp5GAGGIOTTI eT Al

weight for region k thus becomes w+k=sumJk

j=1wjk=N++k∕N+++Table2

describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions

Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)

α D(2)α andDγ all other diversity measures can be derived

fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)

αk for each

individual region k and then obtain the weighted average over all regions H(2)

α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)

α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)

α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions

The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma

alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)

42emsp|emspFormulation in terms of phylogenetic diversity

Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system

421emsp|emspPhylogenetic diversity measures in a single aggregate

Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime

F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system

6emsp |emsp emspensp GAGGIOTTI eT Al

referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)

LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=

sumB

i=1LiaiForanultrametrictree themeanbranch length

issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)

Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas

This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=

sumB

i=1Liwhich isthewell-knownFaithrsquosPDthesumof

the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between

anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)

HereIdenotesthephylogeneticentropy

whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)

422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system

The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate

Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare

(4)qPD=

sumB

i=1Li

(

ai

T

)q1∕(1minusq)

(5)1PD= lim qrarr1

qPD=exp

[

minussumB

i=1Liai

Tln

(

ai

T

)]

equivT exp (I∕T)

(6)I=minussumB

i=1Liai ln ai

TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus

Hierarchical level

Diversity

DecompositionWithin Between Total

3Ecosystem minus minus Dγ Dγ =D(1)α D

(1)

βD(2)

β

2 Region D(2)α D

(2)

β=D

(2)γ ∕D

(2)α D

(2)γ =Dγ D

γ=D

(2)α D

(2)

β

1Communityorpopulation D(1)α D

(1)β

=D(1)γ ∕D

(1)α D

(1)γ =D

(2)α D

(2)α = D

(1)α D

(1)β

TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure

Hierarchical level Speciesallele relative frequency

Population pijk=Nijk∕N+jk=Nijk∕sumS

i=1Nijk

Region pi+k= Ni+k∕N++k=sumJk

j=1(wjk∕w+k)pijk

Ecosystem pi++ = Ni++∕N+++ =sumK

k=1

sumJk

j=1wjkpijk

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 3: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp3GAGGIOTTI eT Al

ecologycommunitygeneticsandeco-evolutionarydynamicsarehelp-ing to foster a convergence between the methods used to measure speciesandgeneticdiversity Indeed in the lastdecadepopulationgeneticistshavebeguntoextendtheuseofpopularspeciesdiversitymetrics to the measurement of genetic diversity by deriving mathe-matical expressions linking themwithevolutionaryparameters suchaseffectivepopulationsizeandmutationandmigrationrates (Chaoetal2015Sherwin2010Sherwinetal2006Smouseetal2015)

Regardless of this very recent methodological convergence ecolo-gistsandpopulationgeneticistsfacethesamechallengeswhentryingtocharacterizehowdiversitycomponents(alphabeta)arestructuredgeographicallyTheseproblemshavebeendescribedingreatdetailinthe literature (eg seeJost 2007 2010) so herewewill only giveaverybrief summaryThe first problem is that the commonlyusedwithin-community andwithin-population abundance diversity mea-sures (eg Shannon-Wiener index and heterozygosity) are in factentropiesmeaningthattheyquantifytheuncertainty inthespeciesor allele identity of randomly sampled individuals or alleles respec-tivelyImportantlytheseindicesdonotscalelinearlywithanincreaseindiversityandsomeofthem(egheterozygosity)reachanasymp-toteforlargevaluesThesecondproblemisthattheldquowithin-rdquo(alpha)andldquobetween-rdquo (beta)componentsofdiversityarenot independentIntuitively ifbetadependsonalpha itwouldbeimpossibletocom-parebetadiversitiesacrossalllevelsatwhichalphadiversitiesdiffer

Partitioning components of diversity is central to progress onthese problems Ecologists have related the traditional alpha betaandgammadiversityusingbothadditiveandmultiplicativeschemesofpartitioningOntheotherhandpopulationgeneticistshavealwaysusedthemultiplicativeschemebasedonthepartitioningoftheprob-abilityofidentitybydescentofpairsofalleles(inbreedingcoefficientsF)Althoughtherehasbeensomeconfusion(cfJost2008Jostetal2010MeirmansampHedrick2011) it iseasytodemonstratethatallestimators of FST a parameter that quantifies genetic structure in-cluding GST (Nei1973) andθ (WeirampCockerham1984) arebasedon thewell-knownmultiplicative decomposition ofWrightrsquos (1951)F-statistics (1minusFIT)= (1minusFIS)(1minusFST) where all terms are entropymeasuresdescribingtheuncertaintyintheidentitybydescentofpairsofalleleswhentheyaresampledfromthewholesetofpopulations(metapopulation)(1minusFIT)fromwithinthesamepopulation(1minusFIS) or fromtwodifferentpopulations(1minusFST)

As mentioned earlier ecologists engaged in intense debates onhow topartition speciesdiversitybut ina recentEcology forum(Ellison 2010) contributors agreed that a first step towards reach-ing a consensus was to adopt Hill numbers to measure diversityDiscussionsamongpopulationgeneticistsarelessadvancedbecauseof their traditional focus on the use of genetic polymorphism datato estimate important evolutionary parameterswhich requires thatgenetic diversity statistics be effective measures of the causes and consequencesofgeneticdifferentiation(egWhitlock2011)MuchtheoreticalworkisstillneededtodemonstratethatdiversitymeasuresbasedoninformationtheorydosatisfythisrequirementHereinsteadwearguethattheadoptionofHillnumbersinpopulationgeneticsisalsoagoodstartingpointtoreachaconsensusonhowtopartition

geneticdiversityInwhatfollowswefirstintroduceHillnumbersandthenpresentaweightedinformation-baseddecompositionframeworkapplicabletobothcommunityandpopulationgeneticsstudies

3emsp |emspOVERVIEW OF HILL NUMBERS

TherearenowmanyarticlesdescribingtheapplicationofHillnum-bers Here we follow Jost (2006) who reintroduced their use inecologyAsJost(2006)notedmostdiversityindicesareinfacten-tropiesthatmeasuretheuncertainty inthe identityofspecies (oralleles) inasampleHowever truediversitymeasuresshouldpro-videestimatesofthenumberofdistinctelements(speciesoralleles)in an aggregate (communityorpopulation) Toderive suchmeas-ureswefirstnotethatdiversityindicescreateequivalenceclassesamong aggregates in the sense that all aggregates with the same diversityindexvaluecanbeconsideredasequivalentForexampleallpopulationswith thesameheterozygosityvalueareequivalentin termsof this indexeven if theyhave radicallydifferentallelesfrequencies (seeAppendixS1 for an example)Moreover for anygivenheterozygosity therewill be an ldquoidealrdquo population inwhichallallelesareequallyfrequentItisthereforepossibletodefineanldquoeffectivenumberofelementsrdquo(allelesinthisexample)asthenum-ber of equally frequent elements in an ldquoideal aggregaterdquo that hasthesamediversityindexvalueastheldquorealaggregaterdquoAnexampleofeffectivenumber inanecologicalcontext istheeffectivenum-berofspeciesintroducedbyMacarthur(1965)whileanequivalentconcept in population genetics is the effective number of alleles(KimuraampCrow1964)

NotethattheconceptofeffectivepopulationsizeNeusedinpop-ulationgeneticsisanalogoustothatofHillnumbersbutisbasedonaratherdifferentconceptMorepreciselyNe is defined as the number ofindividualsinanideal(WrightndashFisher)populationthathasthesamemagnitudeofrandomgeneticdriftastherealpopulationbeingstud-iedTherearedifferentwaysinwhichwecanmeasurethestrengthofgenetic drift the most common being change in average inbreeding coefficientchangeinallelefrequencyvarianceandrateoflossofhet-erozygosityandeachleadtoadifferenttypeofeffectivesizeThustheidealandtherealpopulationsareequivalentintermsoftherateoflossofgeneticdiversityandnotintermsofequalrepresentationofdistinct individuals Probably the only similarity between Ne and the rationaleunderlyingHillnumbersisinthesensethatalltheindividualsintheidealpopulationcontributeequally(onaverage)tothegenepoolofthenextgeneration

Theapplicationoftheabove-statedlogictoanyofthemanydiffer-ententropymeasuresusedinecologyandpopulationgeneticsyieldsasingleexpressionfordiversity

where Sdenotesthenumberofspeciesorallelespi denotes the rel-ativeabundanceorfrequencyofspeciesoralleleiandtheexponent

(1)qDequiv

(

sum

S

i=1pqi

)1∕(1minusq)

4emsp |emsp emspensp GAGGIOTTI eT Al

andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)

where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles

It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition

4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)

Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused

41emsp|emspFormulation in terms of abundance diversity

Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany

number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance

Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities

FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)

α D(2)α andDγThesedi-

versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)

Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk

bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=

sum2

n=0nNinjk

and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=

sumS

i=1Nijk the total number of alleles in region k as N++k=

sumJk

j=1N+jk

and the total number of alleles in the ecosystem as N+++ =sumK

k=1N++k

All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the

(2)1D=exp

(

minussumS

i=1pi ln pi

)

=exp (H)

(3)2D=1∕

(

sumS

i=1p2i

)

emspensp emsp | emsp5GAGGIOTTI eT Al

weight for region k thus becomes w+k=sumJk

j=1wjk=N++k∕N+++Table2

describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions

Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)

α D(2)α andDγ all other diversity measures can be derived

fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)

αk for each

individual region k and then obtain the weighted average over all regions H(2)

α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)

α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)

α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions

The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma

alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)

42emsp|emspFormulation in terms of phylogenetic diversity

Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system

421emsp|emspPhylogenetic diversity measures in a single aggregate

Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime

F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system

6emsp |emsp emspensp GAGGIOTTI eT Al

referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)

LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=

sumB

i=1LiaiForanultrametrictree themeanbranch length

issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)

Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas

This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=

sumB

i=1Liwhich isthewell-knownFaithrsquosPDthesumof

the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between

anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)

HereIdenotesthephylogeneticentropy

whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)

422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system

The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate

Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare

(4)qPD=

sumB

i=1Li

(

ai

T

)q1∕(1minusq)

(5)1PD= lim qrarr1

qPD=exp

[

minussumB

i=1Liai

Tln

(

ai

T

)]

equivT exp (I∕T)

(6)I=minussumB

i=1Liai ln ai

TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus

Hierarchical level

Diversity

DecompositionWithin Between Total

3Ecosystem minus minus Dγ Dγ =D(1)α D

(1)

βD(2)

β

2 Region D(2)α D

(2)

β=D

(2)γ ∕D

(2)α D

(2)γ =Dγ D

γ=D

(2)α D

(2)

β

1Communityorpopulation D(1)α D

(1)β

=D(1)γ ∕D

(1)α D

(1)γ =D

(2)α D

(2)α = D

(1)α D

(1)β

TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure

Hierarchical level Speciesallele relative frequency

Population pijk=Nijk∕N+jk=Nijk∕sumS

i=1Nijk

Region pi+k= Ni+k∕N++k=sumJk

j=1(wjk∕w+k)pijk

Ecosystem pi++ = Ni++∕N+++ =sumK

k=1

sumJk

j=1wjkpijk

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 4: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

4emsp |emsp emspensp GAGGIOTTI eT Al

andsuperscriptq is the order of the diversity and indicates the sen-sitivity of qD the numbers equivalent of the diversity measure being used to commonand rareelements (Jost 2006)Thediversityoforderzero (q =0) iscompletely insensitivetospeciesorallele fre-quencies and is known respectively as species or allelic richnessdepending onwhether it is applied to species or allele frequencydataThediversityoforderone(q =1)weightsthecontributionofeach speciesor alleleby their frequencywithout favouring eithercommonorrarespeciesallelesAlthoughEquation1isnotdefinedfor q=1itslimitexists(Jost2006)

where H is theShannonentropyAllvaluesofq greater than unity disproportionallyfavourthemostcommonspeciesoralleleForex-ampletheSimpsonconcentrationandtheGinindashSimpsonindexwhicharerespectivelyequivalenttoexpectedhomozygosityandexpectedheterozygositywhenappliedtoallelefrequencydataleadtodiver-sitiesoforder2 andgive the sameeffectivenumberof speciesoralleles

It is worth emphasizing that among all these different numberequivalentsortruediversitymeasuresthediversityoforder1iskeybecauseofitsabilitytoweighelementspreciselybytheirfrequencywithout favouring either rare of common elements (Jost 2006)Thereforewewillusethismeasuretodefineournewframeworkfordiversitydecomposition

4emsp |emspWEIGHTED INFORMATION- BASED DECOMPOSITION FRAMEWORK (Q = 1)

Ourdecomposition framework is focusedon the information-baseddiversitymeasure (Hill number of orderq=1) Inwhat followswefirstdescribetheframeworkintermsofabundance(speciesgenetic)diversitiesandthenweprovideanequivalentformulationintermsofphylogeneticdiversityFor simplicitywewilluse thenotationD to refertoabundancediversitiesandPDtorefertophylogeneticdiversi-ties both of order q=1AppendixS2listsallnotationanddefinitionsoftheparametersandvariablesweused

41emsp|emspFormulation in terms of abundance diversity

Herewedevelopaframeworkapplicabletobothspecies(abundancepresencendashabsencebiomass)andgeneticdatatoestimatealphabetaandgammadiversities(iediversitycomponents)acrossdifferentlev-els of a hierarchical spatial structure In this sectionwe consider averysimpleexampleofanecosystemsubdividedintomultipleregionseach of which in turn are subdivided into a number of communities whenconsideringspeciesdataoranumberofpopulationswhencon-sideringgeneticdataHoweverourformulation isapplicabletoany

number of levelswithin a spatially hierarchical partitioning schemeandtheirassociatednumberofcommunitiesandpopulationsateachlevel(nestedscale)suchastheexampleconsideredinoursimulationstudy below (see Figure1) Indeed the framework described hereallows decomposing species and genetic information on an equalfootingthusallowingcontrastingdiversitycomponentsacrosscom-munitiesandpopulationsInotherwordsifgeneticandspeciesabun-dance(orpresencendashabsence)dataareavailableforeverypopulationandeveryspeciesthengeneticandspeciesdiversitycomponentscanbecontrastedwithinandamongspatialscalesaswellasacrossdiffer-entphylogeneticlevelsNotethatourproposedframeworkisbasedon diversities of order q = 1 which are less sensitive than diversities of higher order to the fact that genetic information is not available for allindividualsinapopulationbutratherbasedonsubsamplesofindi-vidualswithinpopulationsAssuchusingq=1allowsonedecompos-inggeneticvariationconsistentlyacrossdifferentspatialsubdivisionlevels that may vary in abundance

Thefinalobjectivewastodecomposetheglobal(ecosystem)diver-sityintoitsregionalandcommunitypopulation-levelcomponentsWedo thisusing thewell-knownadditivepropertyofShannonentropyacrosshierarchicallevels(andthusmultiplicativepartitioningofdiver-sity)(Batty1976Jost2007)Table1presentsthediversities(numberequivalents)thatneedtobeestimatedateachlevelofthehierarchyForeachleveltherewillbeonevaluecorrespondingtospeciesdiver-sityandanothercorrespondingtoallelic (genetic)diversityofapar-ticularspeciesatagiven locus (oranaverageacross loci)FigureS1providesaschematicrepresentationofthecalculationofdiversities

FromTable1 it isapparentthatweonlyneedtouseEquation2to calculate three diversity indices namely D(1)

α D(2)α andDγThesedi-

versity measures are defined in terms of relative abundances of the distinctelements(speciesoralleles)attherespectivelevelsofthehi-erarchyInwhatfollowswefirstpresenttheframeworkasappliedtoallelecountdataandthenexplainhowasimplechangeinthedefini-tionofasingleparameterallowstheapplicationofthesameframe-worktospeciesabundancedataWeassumethatweareconsideringadiploidspecies(buttheschemecanbeeasilygeneralizedforpolyploidspecies)andfocusonthediversityoforderq = 1 which is based on theShannonentropy(seeEquation1)

Geneticdiversityindicesarecalculatedseparatelyforeachlocusso we focus here on a locus with S alleles Additionally we consider an ecosystem subdivided into K regions each having JklocalpopulationsLetNinjk

bethenumberofdiploidindividualswithn(=012)copiesofallele iinpopulationj and region kThenthetotalnumberofcopiesof allele iinpopulationj and region k is Nijk=

sum2

n=0nNinjk

and from this wecanderivethetotalnumberofallelesinpopulationj and region k as N+jk=

sumS

i=1Nijk the total number of alleles in region k as N++k=

sumJk

j=1N+jk

and the total number of alleles in the ecosystem as N+++ =sumK

k=1N++k

All allele frequencies can be derived from these allele counts Forexample the relative frequency of allele i in any given population j within region k is pi|jk = NijkN+jkInthecaseofregion-andecosystem-levelallelefrequencieswepooloverpopulationswithinregionsandoverallregionsandpopulationswithinanecosystemrespectivelyWedefinetheweightforpopulationjandregionk as wjk = N+jkN+++ the

(2)1D=exp

(

minussumS

i=1pi ln pi

)

=exp (H)

(3)2D=1∕

(

sumS

i=1p2i

)

emspensp emsp | emsp5GAGGIOTTI eT Al

weight for region k thus becomes w+k=sumJk

j=1wjk=N++k∕N+++Table2

describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions

Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)

α D(2)α andDγ all other diversity measures can be derived

fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)

αk for each

individual region k and then obtain the weighted average over all regions H(2)

α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)

α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)

α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions

The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma

alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)

42emsp|emspFormulation in terms of phylogenetic diversity

Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system

421emsp|emspPhylogenetic diversity measures in a single aggregate

Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime

F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system

6emsp |emsp emspensp GAGGIOTTI eT Al

referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)

LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=

sumB

i=1LiaiForanultrametrictree themeanbranch length

issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)

Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas

This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=

sumB

i=1Liwhich isthewell-knownFaithrsquosPDthesumof

the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between

anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)

HereIdenotesthephylogeneticentropy

whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)

422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system

The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate

Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare

(4)qPD=

sumB

i=1Li

(

ai

T

)q1∕(1minusq)

(5)1PD= lim qrarr1

qPD=exp

[

minussumB

i=1Liai

Tln

(

ai

T

)]

equivT exp (I∕T)

(6)I=minussumB

i=1Liai ln ai

TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus

Hierarchical level

Diversity

DecompositionWithin Between Total

3Ecosystem minus minus Dγ Dγ =D(1)α D

(1)

βD(2)

β

2 Region D(2)α D

(2)

β=D

(2)γ ∕D

(2)α D

(2)γ =Dγ D

γ=D

(2)α D

(2)

β

1Communityorpopulation D(1)α D

(1)β

=D(1)γ ∕D

(1)α D

(1)γ =D

(2)α D

(2)α = D

(1)α D

(1)β

TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure

Hierarchical level Speciesallele relative frequency

Population pijk=Nijk∕N+jk=Nijk∕sumS

i=1Nijk

Region pi+k= Ni+k∕N++k=sumJk

j=1(wjk∕w+k)pijk

Ecosystem pi++ = Ni++∕N+++ =sumK

k=1

sumJk

j=1wjkpijk

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 5: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp5GAGGIOTTI eT Al

weight for region k thus becomes w+k=sumJk

j=1wjk=N++k∕N+++Table2

describeshowallelespeciesrelativefrequenciesateachlevelarecal-culated in terms of these weight functions

Using these frequencieswe can calculate the genetic diversi-ties at each level of spatial organizationTable3 presents the for-mulas for D(1)

α D(2)α andDγ all other diversity measures can be derived

fromthem(seeTable1)Inthecaseoftheecosystemdiversitythisamountstosimplyreplacingpi inEquation2bypi|++ the allele fre-quencyattheecosystemlevel(seeTable2)Tocalculatethediver-sityattheregionallevelwefirstcalculatetheentropyH(2)

αk for each

individual region k and then obtain the weighted average over all regions H(2)

α Finallywecalculate theexponentof the region-levelentropytoobtainD(2)

α thealphadiversityat theregional levelWeproceedinasimilarfashiontoobtainD(1)

α thediversityatthepop-ulation level but in this case we need to average over regions and populationswithinregions

The calculation of the equivalent diversities based on speciescount data can be carried out using the exact same procedure de-scribed above but in this case Nijkrepresentsthenumberofindivid-ualsofspeciesiinpopulationj and region k All formulas for gamma

alphaandbetaalongwiththedifferentiationmeasuresateachlevelaregiveninTable3Theformulascanbedirectlygeneralizedtoanyarbitrarynumberoflevels(seeSection5)

42emsp|emspFormulation in terms of phylogenetic diversity

Wefirstpresentanoverviewofphylogeneticdiversitymeasuresap-pliedtoasinglenonhierarchicalcasehenceforthreferredtoassingleaggregateforbrevityandthenextendittoconsiderahierarchicallystructured system

421emsp|emspPhylogenetic diversity measures in a single aggregate

Toformulatephylogeneticdiversityinasingleaggregateweassumethatallspeciesorallelesinanaggregateareconnectedbyarootedul-trametricornonultrametricphylogenetictreewithallspeciesallelesastipnodesAllphylogeneticdiversitymeasuresdiscussedbelowarecomputedfromagivenfixedtreebaseoratimereferencepointthatisancestraltoallspeciesallelesintheaggregateAconvenienttime

F IGURE 1emspThespatialrepresentationof32populationsorganizedintoaspatialhierarchy based on three scale levels subregions(eightpopulationseach)regions(16populationseach)andtheecosystem(all32populations)Thedendrogram(upperpanelmdashhierarchicalrepresentationoflevels)representsthespatialrelationship(iegeographicdistance)inwhicheachtiprepresentsapopulationfoundinaparticularsite(lowerpanel)Thecartographicrepresentation(lowerpanel)representsthespatialdistributionofthesesamepopulationsalongageographiccoordinate system

6emsp |emsp emspensp GAGGIOTTI eT Al

referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)

LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=

sumB

i=1LiaiForanultrametrictree themeanbranch length

issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)

Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas

This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=

sumB

i=1Liwhich isthewell-knownFaithrsquosPDthesumof

the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between

anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)

HereIdenotesthephylogeneticentropy

whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)

422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system

The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate

Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare

(4)qPD=

sumB

i=1Li

(

ai

T

)q1∕(1minusq)

(5)1PD= lim qrarr1

qPD=exp

[

minussumB

i=1Liai

Tln

(

ai

T

)]

equivT exp (I∕T)

(6)I=minussumB

i=1Liai ln ai

TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus

Hierarchical level

Diversity

DecompositionWithin Between Total

3Ecosystem minus minus Dγ Dγ =D(1)α D

(1)

βD(2)

β

2 Region D(2)α D

(2)

β=D

(2)γ ∕D

(2)α D

(2)γ =Dγ D

γ=D

(2)α D

(2)

β

1Communityorpopulation D(1)α D

(1)β

=D(1)γ ∕D

(1)α D

(1)γ =D

(2)α D

(2)α = D

(1)α D

(1)β

TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure

Hierarchical level Speciesallele relative frequency

Population pijk=Nijk∕N+jk=Nijk∕sumS

i=1Nijk

Region pi+k= Ni+k∕N++k=sumJk

j=1(wjk∕w+k)pijk

Ecosystem pi++ = Ni++∕N+++ =sumK

k=1

sumJk

j=1wjkpijk

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 6: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

6emsp |emsp emspensp GAGGIOTTI eT Al

referencepointistheageoftherootofthephylogenetictreespannedby all elements Assume that there are B branch segments in the tree and thus there are BcorrespondingnodesBgeSThesetofspeciesallelesisexpandedtoincludealsotheinternalnodesaswellastheter-minalnodesrepresentingspeciesalleleswhichwillthenbethefirstS elements(seeFigureS2)

LetLi denote the length of branch i in the tree i = 1 2 hellip BWefirstexpandthesetofrelativeabundancesofelements(p1p2⋯ pS) (seeEquation1) toa largersetaii=12⋯ B by defining ai as the total relative abundance of the elements descended from the ith nodebranch i = 1 2 hellip BInphylogeneticdiversityanimportantpa-rameter is the mean branch length Ttheabundance-weightedmeanofthedistancesfromthetreebasetoeachoftheterminalbranchtipsthat is T=

sumB

i=1LiaiForanultrametrictree themeanbranch length

issimplyreducedtothetree depth TseeFigure1inChaoChiuandJost (2010)foranexampleForsimplicityourfollowingformulationofphylogeneticdiversityisbasedonultrametrictreesTheextensiontononultrametric trees isstraightforward (via replacingT by T in all formulas)

Chaoetal(20102014)generalizedHillnumberstoaclassofphy-logenetic diversity of order q qPDderivedas

This measure quantifies the effective total branch lengthduring the time interval from Tyearsagoto thepresent Ifq = 0 then 0PD=

sumB

i=1Liwhich isthewell-knownFaithrsquosPDthesumof

the branch lengths of a phylogenetic tree connecting all speciesHowever this measure does not consider species abundancesRaorsquos quadratic entropy Q (Rao amp Nayak 1985) is a widely usedmeasure which takes into account both phylogeny and speciesabundancesThismeasureisageneralizationoftheGinindashSimpsonindex and quantifies the average phylogenetic distance between

anytwoindividualsrandomlyselectedfromtheassemblageChaoetal(2010)showedthattheqPDmeasureoforderq = 2 is a sim-ple transformationofquadraticentropy that is2PD=T∕(1minusQ∕T) Again here we focus on qPDmeasureoforderq = 1 which can be expressedasa functionof thephylogenetic entropy (AllenKonampBar-Yam2009)

HereIdenotesthephylogeneticentropy

whichisageneralizationofShannonrsquosentropythatincorporatesphy-logeneticdistancesamongelementsNotethatwhenthereareonlytipnodesandallbrancheshaveunitlengththenwehaveT = 1 and qPDreducestoHillnumberoforderq(inEquation1)

422emsp|emspPhylogenetic diversity decomposition in a multiple- level hierarchically structured system

The single-aggregate formulation can be extended to consider ahierarchical spatially structured system For the sake of simplic-ity we consider three levels (ecosystem region and communitypopulation) aswe did for the speciesallelic diversity decomposi-tion Assume that there are Selements in theecosystemFor therootedphylogenetictreespannedbyallS elements in the ecosys-temwedefineroot(oratimereferencepoint)numberofnodesbranches B and branch length Li in a similar manner as those in a single aggregate

Forthetipnodesasintheframeworkofspeciesandallelicdi-versity(inTable2)definepi|jk pi|+k and pi|++ i = 1 2 hellip S as the ith speciesorallelerelativefrequenciesatthepopulationregionalandecosystemlevelrespectivelyToexpandtheserelativefrequenciesto the branch set we define ai|jk i = 1 2 hellip B as the summed rela-tiveabundanceofthespeciesallelesdescendedfromtheith nodebranchinpopulation j and region k with similar definitions for ai|+k and ai|++ i = 1 2 hellip B seeFigure1ofChaoetal (2015) foran il-lustrativeexampleThedecompositionforphylogeneticdiversityissimilartothatforHillnumberspresentedinTable1exceptthatnowallmeasuresarereplacedbyphylogeneticdiversityThecorrespond-ingphylogeneticgammaalphaandbetadiversitiesateachlevelare

(4)qPD=

sumB

i=1Li

(

ai

T

)q1∕(1minusq)

(5)1PD= lim qrarr1

qPD=exp

[

minussumB

i=1Liai

Tln

(

ai

T

)]

equivT exp (I∕T)

(6)I=minussumB

i=1Liai ln ai

TABLE 1emspVariousdiversitiesinahierarchicallystructuredsystemandtheirdecompositionbasedondiversitymeasureD = 1D(Hillnumberoforder q=1inEquation2)forphylogeneticdiversitydecompositionreplaceDwithPD=1PD(phylogeneticdiversitymeasureoforderq = 1 in Equation5)seeTable3forallformulasforDandPDThesuperscripts(1)and(2)denotethehierarchicalleveloffocus

Hierarchical level

Diversity

DecompositionWithin Between Total

3Ecosystem minus minus Dγ Dγ =D(1)α D

(1)

βD(2)

β

2 Region D(2)α D

(2)

β=D

(2)γ ∕D

(2)α D

(2)γ =Dγ D

γ=D

(2)α D

(2)

β

1Communityorpopulation D(1)α D

(1)β

=D(1)γ ∕D

(1)α D

(1)γ =D

(2)α D

(2)α = D

(1)α D

(1)β

TABLE 2emspCalculationofallelespeciesrelativefrequenciesatthedifferent levels of the hierarchical structure

Hierarchical level Speciesallele relative frequency

Population pijk=Nijk∕N+jk=Nijk∕sumS

i=1Nijk

Region pi+k= Ni+k∕N++k=sumJk

j=1(wjk∕w+k)pijk

Ecosystem pi++ = Ni++∕N+++ =sumK

k=1

sumJk

j=1wjkpijk

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 7: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp7GAGGIOTTI eT Al

giveninTable3alongwiththecorrespondingdifferentiationmea-suresAppendixS3 presents all mathematical derivations and dis-cussesthedesirablemonotonicityandldquotruedissimilarityrdquopropertiesthatourproposeddifferentiationmeasurespossess

5emsp |emspIMPLEMENTATION OF THE FRAMEWORK BY MEANS OF AN R PACKAGE

TheframeworkdescribedabovehasbeenimplementedintheRfunc-tioniDIP(information-basedDiversityPartitioning)whichisprovidedasDataS1Wealsoprovideashortintroductionwithasimpleexam-pledatasettoexplainhowtoobtainnumericalresultsequivalenttothoseprovidedintables4and5belowfortheHawaiianarchipelagoexampledataset

TheRfunctioniDIPrequirestwoinputmatrices

1 Abundancedata specifying speciesalleles (rows) rawor relativeabundances for each populationcommunity (columns)

2 Structure matrix describing the hierarchical structure of spatialsubdivisionseeasimpleexamplegiveninDataS1Thereisnolimittothenumberofspatialsubdivisions

Theoutputincludes(i)gamma(ortotal)diversityalphaandbetadiversityforeachlevel(ii)proportionoftotalbetainformation(among

aggregates)foundateachleveland(iii)meandifferentiation(dissimi-larity)ateachlevel

We also provide the R function iDIPphylo which implementsan information-based decomposition of phylogenetic diversity andthereforecantakeintoaccounttheevolutionaryhistoryofthespe-ciesbeingstudiedThisfunctionrequiresthetwomatricesmentionedaboveplusaphylogenetictreeinNewickformatForinteresteduserswithoutknowledgeofRwealsoprovideanonlineversionavailablefromhttpschaoshinyappsioiDIPThisinteractivewebapplicationwasdevelopedusingShiny (httpsshinyrstudiocom)ThewebpagecontainstabsprovidingashortintroductiondescribinghowtousethetoolalongwithadetailedUserrsquosGuidewhichprovidesproperinter-pretationsoftheoutputthroughnumericalexamples

6emsp |emspSIMULATION STUDY TO SHOW THE CHARACTERISTICS OF THE FRAMEWORK

Here we describe a simple simulation study to demonstrate theutility and numerical behaviour of the proposed framework Weconsidered an ecosystem composed of 32 populations dividedintofourhierarchicallevels(ecosystemregionsubregionpopula-tionFigure1)Thenumberofpopulationsateach levelwaskeptconstant across all simulations (ie ecosystem with 32 popula-tionsregionswith16populationseachandsubregionswitheight

TABLE 3emspFormulasforαβandγalongwithdifferentiationmeasuresateachhierarchicallevelofspatialsubdivisionforspeciesallelicdiversityandphylogeneticdiversityHereD = 1D(Hillnumberoforderq=1inEquation2)PD=1PD(phylogeneticdiversityoforderq = 1 in Equation5)TdenotesthedepthofanultrametrictreeH=Shannonentropy(Equation2)I=phylogeneticentropy(Equation6)

Hierarchical level Diversity Speciesallelic diversity Phylogenetic diversity

Level3Ecosystem gammaDγ =exp

minusSsum

i=1

pi++ lnpi++

equivexp

(

)

PDγ =Ttimesexp

minusBsum

i=1

Liai++ lnai++

∕T

equivTtimesexp

(

Iγ∕T)

Level2Region gamma D(2)γ =Dγ PD

(2)

γ=PDγ

alpha D(2)α =exp

(

H(2)α

)

PD(2)

α=Ttimesexp

(

I(2)α ∕T

)

where H(2)α =

sum

k

w+kH(2)

αk

where I(2)α =

sum

k

w+kI(2)

αk

H(2)

αk=minus

Ssum

i=1

pi+k ln pi+k I(2)

αk=minus

Bsum

i=1

Liai+k ln ai+k

beta D(2)

β=D

(2)γ ∕D

(2)α PD

(2)

β=PD

(2)

γ∕PD

(2)

α

Level1Population or community

gamma D(1)γ =D

(2)α PD(

1)γ

=PD(2)

α

alpha D(1)α =exp

(

H(1)α

)

PD(1)α

=Ttimesexp(

I(1)α ∕T

)

where H(1)α =

sum

jk

wjkH(1)αjk

where I(1)α =

sum

jk

wjkI(1)αjk

H(1)αjk

=minusSsum

i=1

pijk ln pijk I(1)αjk

=minusBsum

i=1

Liaijk ln aijk

beta D(1)β

=D(1)γ ∕D

(1)α PD

(1)β

=PD(1)γ

∕PD(1)α

Differentiation among aggregates at each level

Level2Amongregions Δ(2)

D=

HγminusH(2)α

minussum

k w+k lnw+k

Δ(2)

PD=

IγminusI(2)α

minusTsum

k w+k lnw+k

Level1Populationcommunitywithinregion

Δ(1)D

=H(2)α minusH

(1)α

minussum

jk wjk ln(wjk∕w+k)Δ(1)PD

=I(2)α minusI

(1)α

minusTsum

jk wjk ln(wjk∕w+k)

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 8: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

8emsp |emsp emspensp GAGGIOTTI eT Al

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 9: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp9GAGGIOTTI eT Al

populations each) Note that herewe used a hierarchywith fourspatialsubdivisionsinsteadofthreelevelsasusedinthepresenta-tion of the framework This decision was based on the fact thatwewantedtosimplifythepresentationofcalculations(threelevelsused)andinthesimulations(fourlevelsused)wewantedtoverifytheperformanceoftheframeworkinamorein-depthmanner

Weexploredsixscenariosvaryinginthedegreeofgeneticstruc-turingfromverystrong(Figure2topleftpanel)toveryweak(Figure2bottomrightpanel)and foreachwegeneratedspatiallystructuredgeneticdatafor10unlinkedbi-allelic lociusinganalgorithmlooselybased on the genetic model of Coop Witonsky Di Rienzo andPritchard (2010) More explicitly to generate correlated allele fre-quenciesacrosspopulationsforbi-alleliclociwedraw10randomvec-tors of dimension 32 from a multivariate normal distribution with mean zero and a covariancematrix corresponding to the particulargeneticstructurescenariobeingconsideredToconstructthecovari-ancematrixwe firstassumed that thecovariancebetweenpopula-tions decreased with distance so that the off-diagonal elements(covariances)forclosestgeographicneighboursweresetto4forthesecond nearest neighbours were set to 3 and so on as such the main diagonalvalues(variance)weresetto5Bymultiplyingtheoff-diagonalelementsofthisvariancendashcovariancematrixbyaconstant(δ)wema-nipulated the strength of the spatial genetic structure from strong(δ=01Figure2)toweak(δ=6Figure2)Deltavalueswerechosentodemonstrategradualchangesinestimatesacrossdiversitycompo-nentsUsing this procedurewegenerated amatrix of randomnor-mally distributed N(01)deviatesɛilforeachpopulation i and locus l The randomdeviateswere transformed into allele frequencies con-strainedbetween0and1usingthesimpletransform

where pilistherelativefrequencyofalleleA1atthelthlocusinpopu-lation i and therefore qil= (1minuspil)istherelativefrequencyofalleleA2Eachbi-allelic locuswasanalysedseparatelybyour frameworkandestimated values of DγDα andDβ foreachspatial level (seeFigure1)were averaged across the 10 loci

Tosimulatearealisticdistributionofnumberofindividualsacrosspopulationswegeneratedrandomvaluesfromalog-normaldistribu-tion with mean 0 and log of standard deviation 1 these values were thenmultipliedbyrandomlygenerateddeviates fromaPoissondis-tribution with λ=30 toobtainawide rangeofpopulationcommu-nitysizesRoundedvalues(tomimicabundancesofindividuals)werethenmultipliedbypil and qiltogeneratealleleabundancesGiventhat

number of individuals was randomly generated across populationsthereisnospatialcorrelationinabundanceofindividualsacrossthelandscapewhichmeansthatthegeneticspatialpatternsweresolelydeterminedbythevariancendashcovariancematrixusedtogeneratecor-relatedallelefrequenciesacrosspopulationsThisfacilitatesinterpre-tation of the simulation results allowing us to demonstrate that the frameworkcanuncoversubtlespatialeffectsassociatedwithpopula-tionconnectivity(seebelow)

For each spatial structurewe generated 100matrices of allelefrequenciesandeachmatrixwasanalysedseparatelytoobtaindistri-butions for DγDαDβ and ΔDFigure2presentsheatmapsofthecor-relationinallelefrequenciesacrosspopulationsforonesimulateddataset under each δ value and shows that our algorithm can generate a wide rangeofgenetic structurescomparable to thosegeneratedbyothermorecomplexsimulationprotocols(egdeVillemereuilFrichotBazinFrancoisampGaggiotti2014)

Figure3 shows the distribution of DαDβ and ΔDvalues for the threelevelsofgeographicvariationbelowtheecosystemlevel(ieDγ geneticdiversity)TheresultsclearlyshowthatourframeworkdetectsdifferencesingeneticdiversityacrossdifferentlevelsofspatialgeneticstructureAsexpectedtheeffectivenumberofalleles(Dαcomponenttoprow) increasesper regionandsubregionas thespatialstructurebecomesweaker (ie fromsmall to largeδvalues)butremainscon-stant at thepopulation level as there is no spatial structure at thislevel(iepopulationsarepanmictic)sodiversityisindependentofδ

TheDβcomponent(middlerow)quantifiestheeffectivenumberofaggregates(regionssubregionspopulations)ateachhierarchicallevelofspatialsubdivisionThelargerthenumberofaggregatesatagivenlevelthemoreheterogeneousthatlevelisThusitisalsoameasureofcompositionaldissimilarityateachlevelWeusethisinterpretationtodescribetheresultsinamoreintuitivemannerAsexpectedasδ in-creasesdissimilaritybetweenregions(middleleftpanel)decreasesbe-causespatialgeneticstructurebecomesweakerandthecompositionaldissimilarityamongpopulationswithinsubregions(middlerightpanel)increases because the strong spatial correlation among populationswithinsubregionsbreaksdown(Figure3centreleftpanel)Thecom-positional dissimilarity between subregionswithin regions (Figure3middlecentrepanel)firstincreasesandthendecreaseswithincreasingδThisisduetoanldquoedgeeffectrdquoassociatedwiththemarginalstatusofthesubregionsattheextremesofthespeciesrange(extremerightandleftsubregionsinFigure2)Asδincreasesthecompositionofthetwosubregionsat thecentreof the species rangewhichbelong todifferentregionschangesmorerapidlythanthatofthetwomarginalsubregionsThusthecompositionaldissimilaritybetweensubregionswithinregionsincreasesHoweverasδcontinuestoincreasespatialeffectsdisappearanddissimilaritydecreases

pil=

0 if εillt0

εil if 0le εille1

1 if εilgt1

F IGURE 2emspHeatmapsofallelefrequencycorrelationsbetweenpairsofpopulationsfordifferentδvaluesDeltavaluescontrolthestrengthofthespatialgeneticstructureamongpopulationswithlowδshavingthestrongestspatialcorrelationamongpopulationsEachheatmaprepresentstheoutcomeofasinglesimulationandeachdotrepresentstheallelefrequencycorrelationbetweentwopopulationsThusthediagonalrepresentsthecorrelationofapopulationwithitselfandisalways1regardlessoftheδvalueconsideredinthesimulationColoursindicaterangeofcorrelationvaluesAsinFigure1thedendrogramsrepresentthespatialrelationship(iegeographicdistance)betweenpopulations

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 10: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

10emsp |emsp emspensp GAGGIOTTI eT Al

The differentiation componentsΔD (bottom row) measures themeanproportionofnonsharedalleles ineachaggregateandfollowsthesametrendsacrossthestrengthofthespatialstructure(ieacross

δvalues)asthecompositionaldissimilarityDβThisisexpectedaswekeptthegeneticvariationequalacrossregionssubregionsandpop-ulations If we had used a nonstationary spatial covariance matrix

F IGURE 3emspSamplingvariation(medianlowerandupperquartilesandextremevalues)forthethreediversitycomponentsexaminedinthesimulationstudy(alphabetaanddifferentiationtotaldiversitygammaisreportedinthetextonly)across100simulatedpopulationsasa function of the strength (δvalues)ofthespatialgeneticvariationamongthethreespatiallevelsconsideredinthisstudy(iepopulationssubregionsandregions)

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 11: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp11GAGGIOTTI eT Al

in which different δvalueswouldbeusedamongpopulations sub-regions and regions then the beta and differentiation componentswouldfollowdifferenttrendsinrelationtothestrengthinspatialge-netic variation

Forthesakeofspacewedonotshowhowthetotaleffectivenum-ber of alleles in the ecosystem (γdiversity)changesasafunctionofthestrengthof the spatial genetic structurebutvalues increasemono-tonically with δ minusDγ = 16 on average across simulations for =01 uptoDγ =19 for δ=6Inotherwordstheeffectivetotalnumberofalleles increasesasgeneticstructuredecreases Intermsofanequi-librium islandmodel thismeans thatmigration helps increase totalgeneticvariabilityIntermsofafissionmodelwithoutmigrationthiscouldbeinterpretedasareducedeffectofgeneticdriftasthegenetreeapproachesastarphylogeny(seeSlatkinampHudson1991)Notehowever that these resultsdependon the total numberofpopula-tionswhichisrelativelylargeinourexampleunderascenariowherethetotalnumberofpopulationsissmallwecouldobtainaverydiffer-entresult(egmigrationdecreasingtotalgeneticdiversity)Ourgoalherewastopresentasimplesimulationsothatuserscangainagoodunderstanding of how these components can be used to interpretgenetic variation across different spatial scales (here region subre-gionsandpopulations)Notethatweconcentratedonspatialgeneticstructureamongpopulationsasametricbutwecouldhaveusedthesamesimulationprotocoltosimulateabundancedistributionsortraitvariation among populations across different spatial scales thoughtheresultswouldfollowthesamepatternsasfortheoneswefound

hereMoreoverforsimplicityweonlyconsideredpopulationvariationwithinonespeciesbutmultiplespeciescouldhavebeenequallycon-sideredincludingaphylogeneticstructureamongthem

7emsp |emspAPPLICATION TO A REAL DATABASE BIODIVERSITY OF THE HAWAIIAN CORAL REEF ECOSYSTEM

Alltheabovederivationsarebasedontheassumptionthatweknowthe population abundances and allele frequencies which is nevertrueInsteadestimationsarebasedonallelecountsamplesandspe-ciesabundanceestimationsUsually theseestimationsareobtainedindependentlysuchthatthesamplesizeofindividualsinapopulationdiffers from thesample sizeof individuals forwhichwehaveallelecountsHerewepresentanexampleoftheapplicationofourframe-worktotheHawaiiancoralreefecosystemusingfishspeciesdensityestimates obtained from NOAA cruises (Williams etal 2015) andmicrosatellitedatafortwospeciesadeep-waterfishEtelis coruscans (Andrewsetal2014)andashallow-waterfishZebrasoma flavescens (Ebleetal2011)

TheHawaiianarchipelago(Figure4)consistsoftworegionsTheMainHawaiian Islands (MHI)which are highvolcanic islandswithmany areas subject to heavy anthropogenic perturbations (land-basedpollutionoverfishinghabitatdestructionandalien species)andtheNorthwesternHawaiianIslands(NWHI)whichareastring

F IGURE 4emspStudydomainspanningtheHawaiianArchipelagoandJohnstonAtollContourlinesdelineate1000and2000misobathsGreenindicates large landmass

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 12: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

12emsp |emsp emspensp GAGGIOTTI eT Al

ofuninhabitedlowislandsatollsshoalsandbanksthatareprimar-ilyonlyaffectedbyglobalanthropogenicstressors (climatechangeoceanacidificationandmarinedebris)(Selkoeetal2009)Inaddi-tionthenortherlylocationoftheNWHIsubjectsthereefstheretoharsher disturbance but higher productivity and these conditionslead to ecological dominance of endemics over nonendemic fishes (FriedlanderBrownJokielSmithampRodgers2003)TheHawaiianarchipelagoisgeographicallyremoteanditsmarinefaunaisconsid-erablylessdiversethanthatofthetropicalWestandSouthPacific(Randall1998)Thenearestcoralreefecosystemis800kmsouth-westof theMHI atJohnstonAtoll and is the third region consid-eredinouranalysisoftheHawaiianreefecosystemJohnstonrsquosreefarea is comparable in size to that ofMaui Island in theMHI andthefishcompositionofJohnstonisregardedasmostcloselyrelatedtotheHawaiianfishcommunitycomparedtootherPacificlocations(Randall1998)

We first present results for species diversity of Hawaiian reeffishesthenforgeneticdiversityoftwoexemplarspeciesofthefishesandfinallyaddressassociationsbetweenspeciesandgeneticdiversi-tiesNotethatwedidnotconsiderphylogeneticdiversityinthisstudybecauseaphylogenyrepresentingtheHawaiianreeffishcommunityis unavailable

71emsp|emspSpecies diversity

Table4 presents the decomposition of fish species diversity oforder q=1TheeffectivenumberofspeciesDγintheHawaiianar-chipelagois49InitselfthisnumberisnotinformativebutitwouldindeedbeveryusefulifwewantedtocomparethespeciesdiversityoftheHawaiianarchipelagowiththatofothershallow-watercoralreefecosystemforexampletheGreatBarrierReefApproximately10speciesequivalentsarelostondescendingtoeachlowerdiver-sity level in the hierarchy (RegionD(2)

α =3777 IslandD(1)α =2775)

GiventhatthereareeightandnineislandsrespectivelyinMHIandNWHIonecaninterpretthisbysayingthatonaverageeachislandcontainsabitmorethanoneendemicspeciesequivalentThebetadiversity D(2)

β=129representsthenumberofregionequivalentsin

theHawaiianarchipelagowhileD(1)

β=1361 is the average number

ofislandequivalentswithinaregionHoweverthesebetadiversi-tiesdependontheactualnumbersof regionspopulationsaswellasonsizes(weights)ofeachregionpopulationThustheyneedto

benormalizedsoastoobtainΔD(seebottomsectionofTable3)toquantifycompositionaldifferentiationBasedonTable4theextentofthiscompositionaldifferentiation intermsofthemeanpropor-tion of nonshared species is 029 among the three regions (MHINWHIandJohnston)and015amongislandswithinaregionThusthere is almost twice as much differentiation among regions than among islands within a region

Wecangainmoreinsightaboutdominanceandotherassemblagecharacteristics by comparing diversity measures of different orders(q=012)attheindividualislandlevel(Figure5a)Thisissobecausethecontributionofrareallelesspeciestodiversitydecreasesasq in-creasesSpeciesrichness(diversityoforderq=0)ismuchlargerthanthose of order q = 1 2 which indicates that all islands contain sev-eralrarespeciesConverselydiversitiesoforderq=1and2forNihoa(andtoalesserextentNecker)areverycloseindicatingthatthelocalcommunityisdominatedbyfewspeciesIndeedinNihoatherelativedensityofonespeciesChromis vanderbiltiis551

FinallyspeciesdiversityislargerinMHIthaninNWHI(Figure6a)Possible explanations for this include better sampling effort in theMHIandhigheraveragephysicalcomplexityofthereefhabitatintheMHI (Friedlander etal 2003) Reef complexity and environmentalconditionsmayalsoleadtomoreevennessintheMHIForinstancethe local adaptationofNWHIendemicsallows themtonumericallydominatethefishcommunityandthisskewsthespeciesabundancedistribution to the leftwhereas in theMHI themore typical tropi-calconditionsmayleadtocompetitiveequivalenceofmanyspeciesAlthoughMHIhavegreaterhumandisturbancethanNWHIeachis-landhassomeareasoflowhumanimpactandthismaypreventhumanimpactfrominfluencingisland-levelspeciesdiversity

72emsp|emspGenetic Diversity

Tables 5 and 6 present the decomposition of genetic diversity forEtelis coruscans and Zebrasoma flavescens respectively They bothmaintain similar amounts of genetic diversity at the ecosystem level abouteightalleleequivalentsandinbothcasesgeneticdiversityatthe regional level is only slightly higher than that maintained at the islandlevel(lessthanonealleleequivalenthigher)apatternthatcon-trastwithwhatisobservedforspeciesdiversity(seeabove)Finallybothspeciesexhibitsimilarpatternsofgeneticstructuringwithdif-ferentiation between regions being less than half that observed

TABLE 4emspDecompositionoffishspeciesdiversityoforderq=1anddifferentiationmeasuresfortheHawaiiancoralreefecosystem

Level Diversity

3HawaiianArchipelago Dγ = 48744

2 Region D(2)γ =Dγ D

(2)α =37773D

(2)

β=1290

1Island(community) D(1)γ =D

(2)α D

(1)α =27752D

(1)β

=1361

Differentiation among aggregates at each level

2 Region Δ(2)

D=0290

1Island(community) Δ(1)D

=0153

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 13: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp13GAGGIOTTI eT Al

among populationswithin regionsNote that this pattern contrastswiththatobservedforspeciesdiversityinwhichdifferentiationwasgreaterbetween regions thanbetween islandswithin regionsNotealsothatdespitethesimilaritiesinthepartitioningofgeneticdiver-sityacrossspatial scalesgeneticdifferentiation ismuchstronger inE coruscans than Z flavescensadifferencethatmaybeexplainedbythefactthatthedeep-waterhabitatoccupiedbytheformermayhavelower water movement than the shallow waters inhabited by the lat-terandthereforemayleadtolargedifferencesinlarvaldispersalpo-tentialbetweenthetwospecies

Overallallelicdiversityofallorders(q=012)ismuchlessspa-tiallyvariablethanspeciesdiversity(Figure5)Thisisparticularlytruefor Z flavescens(Figure5c)whosehighlarvaldispersalpotentialmayhelpmaintainsimilargeneticdiversitylevels(andlowgeneticdifferen-tiation)acrosspopulations

AsitwasthecaseforspeciesdiversitygeneticdiversityinMHIissomewhathigherthanthatobservedinNWHIdespiteitshigherlevelofanthropogenicperturbations(Figure6bc)

8emsp |emspDISCUSSION

Biodiversityisaninherentlyhierarchicalconceptcoveringseverallev-elsoforganizationandspatialscalesHoweveruntilnowwedidnothaveaframeworkformeasuringallspatialcomponentsofbiodiversityapplicable to both genetic and species diversitiesHerewe use an

information-basedmeasure(Hillnumberoforderq=1)todecomposeglobal genetic and species diversity into their various regional- andcommunitypopulation-levelcomponentsTheframeworkisapplica-bletohierarchicalspatiallystructuredscenarioswithanynumberoflevels(ecosystemregionsubregionhellipcommunitypopulation)Wealsodevelopedasimilarframeworkforthedecompositionofphyloge-neticdiversityacrossmultiple-levelhierarchicallystructuredsystemsToillustratetheusefulnessofourframeworkweusedbothsimulateddatawithknowndiversitystructureandarealdatasetstressingtheimportanceofthedecompositionforvariousapplicationsincludingbi-ologicalconservationInwhatfollowswefirstdiscussseveralaspectsofourformulationintermsofspeciesandgeneticdiversityandthenbrieflyaddresstheformulationintermsofphylogeneticdiversity

Hillnumbersareparameterizedbyorderq which determines the sensitivity of the diversity measure to common and rare elements (al-lelesorspecies)Our framework isbasedonaHillnumberoforderq=1 which weights all elements in proportion to their frequencyand leadstodiversitymeasuresbasedonShannonrsquosentropyThis isafundamentallyimportantpropertyfromapopulationgeneticspointofviewbecauseitcontrastswithmeasuresbasedonheterozygositywhich are of order q=2andthereforegiveadisproportionateweighttocommonalleles Indeed it iswellknownthatheterozygosityandrelatedmeasuresareinsensitivetochangesintheallelefrequenciesofrarealleles(egAllendorfLuikartampAitken2012)sotheyperformpoorly when used on their own to detect important demographicchanges in theevolutionaryhistoryofpopulationsandspecies (eg

F IGURE 5emspDiversitymeasuresatallsampledislands(communitiespopulations)expressedintermsofHillnumbersofordersq = 0 1 and 2(a)FishspeciesdiversityofHawaiiancoralreefcommunities(b)geneticdiversityforEtelis coruscans(c)geneticdiversityforZebrasoma flavescens

(a) species diversity (b) E coruscans

(c) Z flabescens

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 14: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

14emsp |emsp emspensp GAGGIOTTI eT Al

bottlenecks)Thatsaid it isstillveryuseful tocharacterizediversityof local populations and communities using Hill numbers of orderq=012toobtainacomprehensivedescriptionofbiodiversityatthisscaleForexampleadiversityoforderq = 0 much larger than those of order q=12indicatesthatpopulationscommunitiescontainsev-eralrareallelesspeciessothatallelesspeciesrelativefrequenciesarehighly uneven Also very similar diversities of order q = 1 2 indicate that thepopulationcommunity isdominatedby fewallelesspeciesWeexemplifythisusewiththeanalysisoftheHawaiianarchipelagodata set (Figure5)A continuous diversity profilewhich depictsHill

numberwithrespecttotheorderqge0containsallinformationaboutallelesspeciesabundancedistributions

As proved by Chao etal (2015 appendixS6) and stated inAppendixS3 information-based differentiation measures such asthoseweproposehere(Table3)possesstwoessentialmonotonicitypropertiesthatheterozygosity-baseddifferentiationmeasureslack(i)theyneverdecreasewhenanewunsharedalleleisaddedtoapopula-tionand(ii)theyneverdecreasewhensomecopiesofasharedallelearereplacedbycopiesofanunsharedalleleChaoetal(2015)provideexamplesshowingthatthecommonlyuseddifferentiationmeasuresof order q = 2 such as GSTandJostrsquosDdonotpossessanyofthesetwoproperties

Other uniform analyses of diversity based on Hill numbersfocusona two-levelhierarchy (communityandmeta-community)andprovidemeasuresthatcouldbeappliedtospeciesabundanceand allele count data as well as species distance matrices andfunctionaldata(egChiuampChao2014Kosman2014ScheinerKosman Presley amp Willig 2017ab) However ours is the onlyone that presents a framework that can be applied to hierarchi-cal systems with an arbitrary number of levels and can be used to deriveproperdifferentiationmeasures in the range [01]ateachlevel with desirable monotonicity and ldquotrue dissimilarityrdquo prop-erties (AppendixS3) Therefore our proposed beta diversity oforder q=1ateach level isalways interpretableandrealisticand

F IGURE 6emspDiagrammaticrepresentationofthehierarchicalstructureunderlyingtheHawaiiancoralreefdatabaseshowingobservedspeciesallelicrichness(inparentheses)fortheHawaiiancoralfishspecies(a)Speciesrichness(b)allelicrichnessforEtelis coruscans(c)allelicrichnessfor Zebrasoma flavescens

(a)

Spe

cies

div

ersi

ty(a

)S

peci

esdi

vers

ity

(b)

Gen

etic

div

ersi

tyE

coru

scan

sG

enet

icdi

vers

ityc

orus

cans

(c)

Gen

etic

div

ersi

tyZ

flab

esce

nsG

enet

icdi

vers

ityyyyyZZZ

flabe

scen

s

TABLE 5emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforEteliscoruscansValuescorrespondtoaverage over 10 loci

Level Diversity

3HawaiianArchipelago Dγ=8249

2 Region D(2)γ =Dγ D

(2)α =8083D

(2)

β=1016

1Island(population) D(1)γ =D

(2)α D

(1)α =7077D

(1)β

=1117

Differentiation among aggregates at each level

2 Region Δ(2)

D=0023

1Island(community) Δ(1)D

=0062

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 15: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp15GAGGIOTTI eT Al

ourdifferentiationmeasurescanbecomparedamonghierarchicallevels and across different studies Nevertheless other existingframeworksbasedonHillnumbersmaybeextendedtomakethemapplicabletomorecomplexhierarchicalsystemsbyfocusingondi-versities of order q = 1

Recently Karlin and Smouse (2017 AppendixS1) derivedinformation-baseddifferentiationmeasurestodescribethegeneticstructure of a hierarchically structured populationTheirmeasuresarealsobasedonShannonentropydiversitybuttheydifferintwoimportant aspects fromourmeasures Firstly ourproposeddiffer-entiationmeasures possess the ldquotrue dissimilarityrdquo property (Chaoetal 2014Wolda 1981) whereas theirs do not In ecology theproperty of ldquotrue dissimilarityrdquo can be enunciated as follows IfN communities each have S equally common specieswith exactlyA speciessharedbyallofthemandwiththeremainingspeciesineachcommunity not shared with any other community then any sensi-ble differentiationmeasuremust give 1minusAS the true proportionof nonshared species in a community Karlin and Smousersquos (2017)measures are useful in quantifying other aspects of differentiationamongaggregatesbutdonotmeasureldquotruedissimilarityrdquoConsiderasimpleexamplepopulationsIandIIeachhas10equallyfrequental-leles with 4 shared then intuitively any differentiation measure must yield60HoweverKarlinandSmousersquosmeasureinthissimplecaseyields3196ontheotherhandoursgivesthetruenonsharedpro-portionof60Thesecondimportantdifferenceisthatwhenthereare only two levels our information-baseddifferentiationmeasurereduces to thenormalizedmutual information (Shannondifferenti-ation)whereas theirs does not Sherwin (2010) indicated that themutual information is linearly relatedtothechi-squarestatistic fortesting allelic differentiation between populations Thus ourmea-surescanbelinkedtothewidelyusedchi-squarestatisticwhereastheirs cannot

In this paper all diversitymeasures (alpha beta and gamma di-versities) and differentiation measures are derived conditional onknowing true species richness and species abundances In practicespeciesrichnessandabundancesareunknownallmeasuresneedtobe estimated from sampling dataWhen there are undetected spe-ciesoralleles inasample theundersamplingbias for themeasuresof order q = 2 is limited because they are focused on the dominant

speciesoralleleswhichwouldbesurelyobservedinanysampleForinformation-basedmeasures it iswellknownthat theobserveden-tropydiversity (ie by substituting species sample proportions intotheentropydiversityformulas)exhibitsnegativebiastosomeextentNeverthelesstheundersamplingbiascanbelargelyreducedbynovelstatisticalmethodsproposedbyChaoandJost(2015)Inourrealdataanalysis statisticalestimationwasnotappliedbecause thepatternsbased on the observed and estimated values are generally consistent When communities or populations are severely undersampled sta-tisticalestimationshouldbeappliedtoreduceundersamplingbiasAmorethoroughdiscussionofthestatisticalpropertiesofourmeasureswillbepresentedinaseparatestudyHereourobjectivewastoin-troducetheinformation-basedframeworkandexplainhowitcanbeappliedtorealdata

Our simulation study clearly shows that the diversity measuresderivedfromourframeworkcanaccuratelydescribecomplexhierar-chical structuresForexampleourbetadiversityDβ and differentia-tion ΔD measures can uncover the increase in differentiation between marginalandwell-connectedsubregionswithinaregionasspatialcor-relationacrosspopulations(controlledbytheparameterδ in our sim-ulations)diminishes(Figure3)Indeedthestrengthofthehierarchicalstructurevaries inacomplexwaywithδ Structuring within regions declines steadily as δ increases but structuring between subregions within a region first increases and then decreases as δ increases (see Figure2)Nevertheless forvery largevaluesofδ hierarchical struc-turing disappears completely across all levels generating spatial ge-neticpatternssimilartothoseobservedfortheislandmodelAmoredetailedexplanationofthemechanismsinvolvedispresentedintheresults section

TheapplicationofourframeworktotheHawaiiancoralreefdataallowsustodemonstratetheintuitiveandstraightforwardinterpreta-tionofourdiversitymeasuresintermsofeffectivenumberofcompo-nentsThedatasetsconsistof10and13microsatellitelocicoveringonlyasmallfractionofthegenomeofthestudiedspeciesHowevermoreextensivedatasetsconsistingofdenseSNParraysarequicklybeingproducedthankstotheuseofnext-generationsequencingtech-niquesAlthough SNPs are bi-allelic they can be generated inverylargenumberscoveringthewholegenomeofaspeciesandthereforetheyaremorerepresentativeofthediversitymaintainedbyaspeciesAdditionallythesimulationstudyshowsthattheanalysisofbi-allelicdatasetsusingourframeworkcanuncovercomplexspatialstructuresTheRpackageweprovidewillgreatlyfacilitatetheapplicationofourapproachtothesenewdatasets

Our framework provides a consistent anddetailed characteriza-tionofbiodiversityatalllevelsoforganizationwhichcanthenbeusedtouncoverthemechanismsthatexplainobservedspatialandtemporalpatternsAlthoughwestillhavetoundertakeaverythoroughsensi-tivity analysis of our diversity measures under a wide range of eco-logical and evolutionary scenarios the results of our simulation study suggestthatdiversitymeasuresderivedfromourframeworkmaybeused as summary statistics in the context ofApproximateBayesianComputation methods (Beaumont Zhang amp Balding 2002) aimedatmaking inferences about theecology anddemographyof natural

TABLE 6emspDecompositionofgeneticdiversityoforderq = 1 and differentiationmeasuresforZebrasomaflavescensValuescorrespondtoaveragesover13loci

Level Diversity

3HawaiianArchipelago Dγ = 8404

2 Region D(2)γ =Dγ D

(2)α =8290D

(2)

β=1012

1Island(community) D(1)γ =D

(2)α D

(1)α =7690D

(1)β

=1065

Differentiation among aggregates at each level

2 Region Δ(2)

D=0014

1Island(community) Δ(1)D

=0033

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 16: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

16emsp |emsp emspensp GAGGIOTTI eT Al

populations For example our approach provides locus-specific di-versitymeasuresthatcouldbeusedto implementgenomescanap-proachesaimedatdetectinggenomicregionssubjecttoselection

Weexpectour frameworktohave importantapplications in thedomain of community genetics This field is aimed at understand-ingthe interactionsbetweengeneticandspeciesdiversity (Agrawal2003)Afrequentlyusedtooltoachievethisgoal iscentredaroundthe studyof speciesndashgenediversity correlations (SGDCs)There arenowmanystudiesthathaveassessedtherelationshipbetweenspe-ciesandgeneticdiversity(reviewedbyVellendetal2014)buttheyhaveledtocontradictoryresultsInsomecasesthecorrelationispos-itive in others it is negative and in yet other cases there is no correla-tionThesedifferencesmaybe explainedby amultitudeof factorssomeofwhichmayhaveabiologicalunderpinningbutonepossibleexplanationisthatthemeasurementofgeneticandspeciesdiversityis inconsistent across studies andevenwithin studiesForexamplesomestudieshavecorrelatedspecies richnessameasure thatdoesnotconsiderabundancewithgenediversityorheterozygositywhicharebasedonthefrequencyofgeneticvariantsandgivemoreweighttocommonthanrarevariantsInothercasesstudiesusedconsistentmeasuresbutthesewerenotaccuratedescriptorsofdiversityForex-amplespeciesandallelicrichnessareconsistentmeasuresbuttheyignoreanimportantaspectofdiversitynamelytheabundanceofspe-ciesandallelicvariantsOurnewframeworkprovidesldquotruediversityrdquomeasuresthatareconsistentacrosslevelsoforganizationandthere-foretheyshouldhelpimproveourunderstandingoftheinteractionsbetweengenetic and speciesdiversities In this sense it provides amorenuancedassessmentof theassociationbetweenspatial struc-turingofspeciesandgeneticdiversityForexampleafirstbutsome-whatlimitedapplicationofourframeworktotheHawaiianarchipelagodatasetuncoversadiscrepancybetweenspeciesandgeneticdiversityspatialpatternsThedifferenceinspeciesdiversitybetweenregionaland island levels ismuch larger (26)thanthedifference ingeneticdiversitybetweenthesetwo levels (1244forE coruscansand7for Z flavescens)Moreoverinthecaseofspeciesdiversitydifferenti-ationamongregionsismuchstrongerthanamongpopulationswithinregions butweobserved theexactoppositepattern in the caseofgeneticdiversitygeneticdifferentiationisweakeramongregionsthanamong islandswithinregionsThisclearly indicatesthatspeciesandgeneticdiversityspatialpatternsaredrivenbydifferentprocesses

InourhierarchicalframeworkandanalysisbasedonHillnumberof order q=1allspecies(oralleles)areconsideredtobeequallydis-tinctfromeachothersuchthatspecies(allelic)relatednessisnottakenintoaccountonlyspeciesabundancesareconsideredToincorporateevolutionaryinformationamongspecieswehavealsoextendedChaoetal(2010)rsquosphylogeneticdiversityoforderq = 1 to measure hierar-chicaldiversitystructurefromgenestoecosystems(Table3lastcol-umn)Chaoetal(2010)rsquosmeasureoforderq=1reducestoasimpletransformationofthephylogeneticentropywhichisageneralizationofShannonrsquosentropythatincorporatesphylogeneticdistancesamongspecies (Allenetal2009)Wehavealsoderivedthecorrespondingdifferentiation measures at each level of the hierarchy (bottom sec-tion of Table3) Note that a phylogenetic tree encapsulates all the

informationaboutrelationshipsamongallspeciesandindividualsorasubsetofthemOurproposeddendrogram-basedphylogeneticdiver-sitymeasuresmakeuseofallsuchrelatednessinformation

Therearetwoother importanttypesofdiversitythatwedonotdirectlyaddressinourformulationThesearetrait-basedfunctionaldi-versityandmoleculardiversitybasedonDNAsequencedataInbothofthesecasesdataatthepopulationorspecieslevelistransformedinto pairwise distancematrices However information contained inadistancematrixdiffers from thatprovidedbyaphylogenetic treePetcheyandGaston(2002)appliedaclusteringalgorithmtothespe-cies pairwise distancematrix to construct a functional dendrogramand then obtain functional diversity measures An unavoidable issue intheirapproachishowtoselectadistancemetricandaclusteringalgorithm to construct the dendrogram both distance metrics and clustering algorithmmay lead to a loss or distortion of species andDNAsequencepairwisedistanceinformationIndeedMouchetetal(2008) demonstrated that the results obtained using this approacharehighlydependentontheclusteringmethodbeingusedMoreoverMaire Grenouillet Brosse andVilleger (2015) noted that even thebestdendrogramisoftenofverylowqualityThuswedonotneces-sarily suggest theuseofdendrogram-basedapproaches focusedontraitandDNAsequencedatatogenerateabiodiversitydecompositionatdifferenthierarchicalscalesakintotheoneusedhereforphyloge-neticstructureAnalternativeapproachtoachievethisgoalistousedistance-based functionaldiversitymeasuresandseveral suchmea-sureshavebeenproposed (egChiuampChao2014Kosman2014Scheineretal2017ab)Howeverthedevelopmentofahierarchicaldecompositionframeworkfordistance-baseddiversitymeasuresthatsatisfiesallmonotonicityandldquotruedissimilarityrdquopropertiesismathe-maticallyverycomplexNeverthelesswenotethatwearecurrentlyextendingourframeworktoalsocoverthiscase

Theapplicationofourframeworktomoleculardataisperformedunder the assumptionof the infinite allelemutationmodelThus itcannotmakeuseoftheinformationcontainedinmarkerssuchasmi-crosatellitesandDNAsequencesforwhichitispossibletocalculatedistancesbetweendistinctallelesWealsoassumethatgeneticmark-ers are independent (ie theyare in linkageequilibrium)which im-pliesthatwecannotusetheinformationprovidedbytheassociationofallelesatdifferentlociThissituationissimilartothatoffunctionaldiversity(seeprecedingparagraph)andrequirestheconsiderationofadistancematrixMorepreciselyinsteadofconsideringallelefrequen-ciesweneed to focusongenotypicdistancesusingmeasuressuchasthoseproposedbyKosman(1996)andGregoriusetal(GregoriusGilletampZiehe2003)Asmentionedbeforewearecurrentlyextend-ingourapproachtodistance-baseddatasoastoobtainahierarchi-calframeworkapplicabletobothtrait-basedfunctionaldiversityandDNAsequence-basedmoleculardiversity

Anessentialrequirementinbiodiversityresearchistobeabletocharacterizecomplexspatialpatternsusinginformativediversitymea-sures applicable to all levels of organization (fromgenes to ecosys-tems)theframeworkweproposefillsthisknowledgegapandindoingsoprovidesnewtoolstomakeinferencesaboutbiodiversityprocessesfromobservedspatialpatterns

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 17: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

emspensp emsp | emsp17GAGGIOTTI eT Al

ACKNOWLEDGEMENTS

This work was assisted through participation in ldquoNext GenerationGeneticMonitoringrdquoInvestigativeWorkshopattheNationalInstituteforMathematicalandBiologicalSynthesissponsoredbytheNationalScienceFoundation throughNSFAwardDBI-1300426withaddi-tionalsupportfromTheUniversityofTennesseeKnoxvilleHawaiianfish community data were provided by the NOAA Pacific IslandsFisheries Science Centerrsquos Coral Reef Ecosystem Division (CRED)with funding fromNOAACoralReefConservationProgramOEGwas supported by theMarine Alliance for Science and Technologyfor Scotland (MASTS) A C and C H C were supported by theMinistryofScienceandTechnologyTaiwanPP-Nwassupportedby a Canada Research Chair in Spatial Modelling and BiodiversityKASwassupportedbyNationalScienceFoundation(BioOCEAwardNumber 1260169) and theNational Center for Ecological Analysisand Synthesis

DATA ARCHIVING STATEMENT

AlldatausedinthismanuscriptareavailableinDRYAD(httpsdoiorgdxdoiorg105061dryadqm288)andBCO-DMO(httpwwwbco-dmoorgproject552879)

ORCID

Oscar E Gaggiotti httporcidorg0000-0003-1827-1493

Christine Edwards httporcidorg0000-0001-8837-4872

REFERENCES

AgrawalAA(2003)CommunitygeneticsNewinsightsintocommunityecol-ogybyintegratingpopulationgeneticsEcology 84(3)543ndash544httpsdoiorg1018900012-9658(2003)084[0543CGNIIC]20CO2

Allen B Kon M amp Bar-Yam Y (2009) A new phylogenetic diversitymeasure generalizing the shannon index and its application to phyl-lostomid bats American Naturalist 174(2) 236ndash243 httpsdoiorg101086600101

AllendorfFWLuikartGHampAitkenSN(2012)Conservation and the genetics of populations2ndedHobokenNJWiley-Blackwell

AndrewsKRMoriwakeVNWilcoxCGrauEGKelleyCPyleR L amp Bowen B W (2014) Phylogeographic analyses of sub-mesophotic snappers Etelis coruscans and Etelis ldquomarshirdquo (FamilyLutjanidae)revealconcordantgeneticstructureacrosstheHawaiianArchipelagoPLoS One 9(4) e91665httpsdoiorg101371jour-nalpone0091665

BattyM(1976)EntropyinspatialaggregationGeographical Analysis 8(1)1ndash21

BeaumontMAZhangWYampBaldingDJ(2002)ApproximateBayesiancomputationinpopulationgeneticsGenetics 162(4)2025ndash2035

BungeJWillisAampWalshF(2014)Estimatingthenumberofspeciesinmicrobial diversity studies Annual Review of Statistics and Its Application 1 edited by S E Fienberg 427ndash445 httpsdoiorg101146annurev-statistics-022513-115654

ChaoAampChiuC-H(2016)Bridgingthevarianceanddiversitydecom-positionapproachestobetadiversityviasimilarityanddifferentiationmeasures Methods in Ecology and Evolution 7(8)919ndash928httpsdoiorg1011112041-210X12551

Chao A Chiu C H Hsieh T C Davis T Nipperess D A amp FaithD P (2015) Rarefaction and extrapolation of phylogenetic diver-sity Methods in Ecology and Evolution 6(4) 380ndash388 httpsdoiorg1011112041-210X12247

ChaoAChiuCHampJost L (2010) Phylogenetic diversitymeasuresbasedonHillnumbersPhilosophical Transactions of the Royal Society of London Series B Biological Sciences 365(1558)3599ndash3609httpsdoiorg101098rstb20100272

ChaoANChiuCHampJostL(2014)Unifyingspeciesdiversityphy-logenetic diversity functional diversity and related similarity and dif-ferentiationmeasuresthroughHillnumbersAnnual Review of Ecology Evolution and Systematics 45 edited by D J Futuyma 297ndash324httpsdoiorg101146annurev-ecolsys-120213-091540

ChaoA amp Jost L (2015) Estimating diversity and entropy profilesviadiscoveryratesofnewspeciesMethods in Ecology and Evolution 6(8)873ndash882httpsdoiorg1011112041-210X12349

ChaoAJostLHsiehTCMaKHSherwinWBampRollinsLA(2015) Expected Shannon entropy and shannon differentiationbetween subpopulations for neutral genes under the finite islandmodel PLoS One 10(6) e0125471 httpsdoiorg101371journalpone0125471

ChiuCHampChaoA(2014)Distance-basedfunctionaldiversitymeasuresand their decompositionA framework based onHill numbersPLoS One 9(7)e100014httpsdoiorg101371journalpone0100014

Coop G Witonsky D Di Rienzo A amp Pritchard J K (2010) Usingenvironmental correlations to identify loci underlying local ad-aptation Genetics 185(4) 1411ndash1423 httpsdoiorg101534genetics110114819

EbleJAToonenRJ Sorenson LBasch LV PapastamatiouY PampBowenBW(2011)EscapingparadiseLarvalexportfromHawaiiin an Indo-Pacific reef fish the yellow tang (Zebrasoma flavescens)Marine Ecology Progress Series 428245ndash258httpsdoiorg103354meps09083

Ellison A M (2010) Partitioning diversity Ecology 91(7) 1962ndash1963httpsdoiorg10189009-16921

FriedlanderAMBrown EK Jokiel P L SmithWRampRodgersK S (2003) Effects of habitat wave exposure and marine pro-tected area status on coral reef fish assemblages in theHawaiianarchipelago Coral Reefs 22(3) 291ndash305 httpsdoiorg101007s00338-003-0317-2

GregoriusHRGilletEMampZieheM (2003)Measuringdifferencesof trait distributions between populationsBiometrical Journal 45(8)959ndash973httpsdoiorg101002(ISSN)1521-4036

Hendry A P (2013) Key questions in the genetics and genomics ofeco-evolutionary dynamics Heredity 111(6) 456ndash466 httpsdoiorg101038hdy201375

HillMO(1973)DiversityandevennessAunifyingnotationanditscon-sequencesEcology 54(2)427ndash432httpsdoiorg1023071934352

JostL(2006)EntropyanddiversityOikos 113(2)363ndash375httpsdoiorg101111j20060030-129914714x

Jost L (2007) Partitioning diversity into independent alpha andbeta components Ecology 88(10) 2427ndash2439 httpsdoiorg10189006-17361

Jost L (2008) GST and its relatives do not measure differen-tiation Molecular Ecology 17(18) 4015ndash4026 httpsdoiorg101111j1365-294X200803887x

JostL(2010)IndependenceofalphaandbetadiversitiesEcology 91(7)1969ndash1974httpsdoiorg10189009-03681

JostLDeVriesPWallaTGreeneyHChaoAampRicottaC (2010)Partitioning diversity for conservation analyses Diversity and Distributions 16(1)65ndash76httpsdoiorg101111j1472-4642200900626x

Karlin E F amp Smouse P E (2017) Allo-allo-triploid Sphagnum x fal-catulum Single individuals contain most of the Holantarctic diver-sity for ancestrally indicative markers Annals of Botany 120(2) 221ndash231

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 18: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

18emsp |emsp emspensp GAGGIOTTI eT Al

KimuraMampCrowJF(1964)NumberofallelesthatcanbemaintainedinfinitepopulationsGenetics 49(4)725ndash738

KosmanE(1996)DifferenceanddiversityofplantpathogenpopulationsAnewapproachformeasuringPhytopathology 86(11)1152ndash1155

KosmanE (2014)MeasuringdiversityFrom individuals topopulationsEuropean Journal of Plant Pathology 138(3) 467ndash486 httpsdoiorg101007s10658-013-0323-3

MacarthurRH (1965)Patternsof speciesdiversityBiological Reviews 40(4) 510ndash533 httpsdoiorg101111j1469-185X1965tb00815x

Magurran A E (2004) Measuring biological diversity Hoboken NJBlackwellScience

Maire E Grenouillet G Brosse S ampVilleger S (2015) Howmanydimensions are needed to accurately assess functional diversity A pragmatic approach for assessing the quality of functional spacesGlobal Ecology and Biogeography 24(6) 728ndash740 httpsdoiorg101111geb12299

MeirmansPGampHedrickPW(2011)AssessingpopulationstructureF-STand relatedmeasuresMolecular Ecology Resources 11(1)5ndash18httpsdoiorg101111j1755-0998201002927x

Mendes R S Evangelista L R Thomaz S M Agostinho A A ampGomes L C (2008) A unified index to measure ecological di-versity and species rarity Ecography 31(4) 450ndash456 httpsdoiorg101111j0906-7590200805469x

MouchetMGuilhaumonFVillegerSMasonNWHTomasiniJAampMouillotD(2008)Towardsaconsensusforcalculatingdendrogram-based functional diversity indices Oikos 117(5)794ndash800httpsdoiorg101111j0030-1299200816594x

Nei M (1973) Analysis of gene diversity in subdivided populationsProceedings of the National Academy of Sciences of the United States of America 70 3321ndash3323

PetcheyO L ampGaston K J (2002) Functional diversity (FD) speciesrichness and community composition Ecology Letters 5 402ndash411 httpsdoiorg101046j1461-0248200200339x

ProvineWB(1971)The origins of theoretical population geneticsChicagoILUniversityofChicagoPress

RandallJE(1998)ZoogeographyofshorefishesoftheIndo-Pacificre-gion Zoological Studies 37(4)227ndash268

RaoCRampNayakTK(1985)CrossentropydissimilaritymeasuresandcharacterizationsofquadraticentropyIeee Transactions on Information Theory 31(5)589ndash593httpsdoiorg101109TIT19851057082

Scheiner S M Kosman E Presley S J ampWillig M R (2017a) Thecomponents of biodiversitywith a particular focus on phylogeneticinformation Ecology and Evolution 7(16) 6444ndash6454 httpsdoiorg101002ece33199

Scheiner S M Kosman E Presley S J amp Willig M R (2017b)Decomposing functional diversityMethods in Ecology and Evolution 8(7)809ndash820httpsdoiorg1011112041-210X12696

SelkoeKAGaggiottiOETremlEAWrenJLKDonovanMKToonenRJampConsortiuHawaiiReefConnectivity(2016)TheDNAofcoralreefbiodiversityPredictingandprotectinggeneticdiversityofreef assemblages Proceedings of the Royal Society B- Biological Sciences 283(1829)

SelkoeKAHalpernBSEbertCMFranklinECSeligERCaseyKShellipToonenRJ(2009)AmapofhumanimpactstoaldquopristinerdquocoralreefecosystemthePapahānaumokuākeaMarineNationalMonumentCoral Reefs 28(3)635ndash650httpsdoiorg101007s00338-009-0490-z

SherwinWB(2010)Entropyandinformationapproachestogeneticdi-versityanditsexpressionGenomicgeographyEntropy 12(7)1765ndash1798httpsdoiorg103390e12071765

SherwinWBJabotFRushRampRossettoM (2006)Measurementof biological information with applications from genes to land-scapes Molecular Ecology 15(10) 2857ndash2869 httpsdoiorg101111j1365-294X200602992x

SlatkinM ampHudson R R (1991) Pairwise comparisons ofmitochon-drialDNAsequencesinstableandexponentiallygrowingpopulationsGenetics 129(2)555ndash562

SmousePEWhiteheadMRampPeakallR(2015)Aninformationaldi-versityframeworkillustratedwithsexuallydeceptiveorchidsinearlystages of speciationMolecular Ecology Resources 15(6) 1375ndash1384httpsdoiorg1011111755-099812422

VellendMLajoieGBourretAMurriaCKembelSWampGarantD(2014)Drawingecologicalinferencesfromcoincidentpatternsofpop-ulation- and community-level biodiversityMolecular Ecology 23(12)2890ndash2901httpsdoiorg101111mec12756

deVillemereuil P Frichot E Bazin E Francois O amp Gaggiotti O E(2014)GenomescanmethodsagainstmorecomplexmodelsWhenand how much should we trust them Molecular Ecology 23(8)2006ndash2019httpsdoiorg101111mec12705

WeirBSampCockerhamCC(1984)EstimatingF-statisticsfortheanaly-sisofpopulationstructureEvolution 38(6)1358ndash1370

WhitlockMC(2011)GrsquoSTandDdonotreplaceF-STMolecular Ecology 20(6)1083ndash1091httpsdoiorg101111j1365-294X201004996x

Williams IDBaumJKHeenanAHansonKMNadonMOampBrainard R E (2015)Human oceanographic and habitat drivers ofcentral and western Pacific coral reef fish assemblages PLoS One 10(4)e0120516ltGotoISIgtWOS000352135600033httpsdoiorg101371journalpone0120516

WoldaH(1981)SimilarityindexessamplesizeanddiversityOecologia 50(3)296ndash302

WrightS(1951)ThegeneticalstructureofpopulationsAnnals of Eugenics 15(4)323ndash354

SUPPORTING INFORMATION

Additional Supporting Information may be found online in thesupportinginformationtabforthisarticle

How to cite this articleGaggiottiOEChaoAPeres-NetoPetalDiversityfromgenestoecosystemsAunifyingframeworkto study variation across biological metrics and scales Evol Appl 2018001ndash18 httpsdoiorg101111eva12593

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 19: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

1

APPENDIX S1 Effective number of alleles ndash A simple example Example of two populations that have radically different allele frequencies but belong to the same equivalence class because they have the same expected heterozygosity Population 2 with five distinct alleles is equivalent to a population with only two equally abundant alleles Note also that population 1 has the maximum possible heterozygosity when only two alleles are present Thus one can say that it represents an ldquoidealrdquo population ie a population with the maximum possible diversity given the number of distinct alleles it contains Therefore the ldquoeffectiverdquo number of alleles in population 2 is two

Allele Population 1 2

A1 05 001 A2 05 010 A3 0 0665 A4 0 0215 A5 0 001 He 05 05

APPENDIX S2 ndash Table of parameters and variables used Definitions of the various parameters and variables following the order in which they appear in the text Symbol Definition

119915119954 abundance diversity of order q also referred to as Hill number of order q

119927119915119954 phylogenetic diversity of order q S total number of distinct elements = speciesalleles H Shannon entropy K number of regions 119921119948 number of local populationscommunities within region k 119960119947119948 weight given to populationcommunity j of region k 119960(119948 weight given to region k 119915120630(119949) alpha abundance diversity at level l of the hierarchy

119915120631(119949) beta abundance diversity at level l of the hierarchy

119915120632(119949) gamma abundance diversity at level l of the hierarchy 119949 superscript to denote populationcommunity subregion region hellip 119915120632 gamma abundance diversity at the ecosystem level 119925119946119951119947119948 number of individuals with 119899 = 012 copies of allele i in

populationcommunity j of region k 119925119946119947119948 total number of copies of allelespecies i in populationcommunity j of

region k

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 20: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

2

119925(119947119948 total number of allelesindividuals in population j of region k 119925((119948 total number of allelesindividuals in region k 119925((( total number of allelesindividuals in the ecosystem 119953119946|119947119948 frequency of allelespecies i in population j of region k 119953119946|(119948 frequency of allelespecies i in region k 119953119946|(( frequency of allelespecies i in the ecosystem 119919120630119947119948(120783) alpha entropy for population j of region k

119919120630119948(120784) alpha entropy for region k

119919120630(119949) total alpha entropy at level l of the hierarchy

120491119915(119949) abundance diversity differentiation among aggregates (l =

populationscommunities regions) B number of branch segments in the phylogenetic tree 119923119946 length of branch 119894 = 123⋯ 119861 119938119946 total relative abundance of elements (allelesspecies) descended from

the ith nodebranch 119931E mean branch length T depth of an ultrametric tree ( = 119879G) 119928 Raorsquos quadratic entropy I phylogenetic entropy

119938119946|119947119948 total relative abundance of allelespecies descended from node i in populationcommunity j of region k

119938119946|(119948 total relative abundance of allelespecies descended from node i across all populationscommunities in region k

119938119946|(( total relative abundance of allelespecies descended from node i across all populations and regions in the ecosystem

119927119915120630(119949) alpha phylogenetic diversity at level l of the hierarchy

119927119915120631(119949) beta phylogenetic diversity at level l of the hierarchy

119927119915120632(119949) gamma phylogenetic diversity at level l of the hierarchy

119927119915120632 gamma phylogenetic diversity at the ecosystem level

119920120630119947119948(120783) alpha phylogenetic entropy for population j of region k

119920120630119948(120784) alpha phylogenetic entropy for region k

119920120630(119949) total alpha phylogenetic entropy at level l of the hierarchy

120491119927119915(119949) phylogenetic diversity differentiation among aggregates (l =

populationscommunities regions)

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 21: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

3

APPENDIX S3 Derivation of Differentiation Measures We derive all differentiation measures in terms of allele frequencies but note that all derivations are valid for species diversity it suffices to replace the term ldquoallele frequencyrdquo with ldquospecies frequencyrdquo We first present the details for two-level hierarchy and then extend all procedures to three-level hierarchy Generalization to an arbitrary number of levels is parallel Shannon differentiation measure in two-level hierarchy (ecosystem and populations) Assume that there are J populations and S alleles in an ecosystem Denote the relative frequency of allele i within population j as 119901K|L sum 119901K|LN

KOP = 1 for any j = 1 2 hellip J For any given population weights Q119908P 119908S⋯ 119908TUsum 119908L

TLOP = 1 the relative frequency of allele i in

the ecosystem becomes 119901K|( sum 119908L119901K|LTLOP The gamma and alpha Shannon entropies can be

expressed as 119867W = minussum 119901K|( ln 119901K|(N

KOP = minussum Qsum 119908L119901K|LTLOP U lnQsum 119908L119901K|[

T[OP UN

KOP and

119867 = minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP

Theorem S31 For the gamma and alpha entropies defined above for two-level hierarchy we have the following inequalities

0 le 119867W minus 119867 le minussum 119908L ln119908LTLOP

When all populations have identical allele relative frequency distributions we have 119867W minus119867 = 0 when the J populations are completely distinct (no shared alleles) we have 119867W minus119867 = minussum 119908L

TLOP ln119908L

Proof Since 119891(119909) = minus119909 log119909 is a concave function it follows from the Jensen inequality that for any allele i we have

minusQsum 119908LTLOP 119901K|LU lnQsum 119908L

TLOP 119901K|LU ge minussum 119908L119901K|L

TLOP ln119901K|L

Summing over all alleles we then obtain

minussum Qsum 119908LTLOP 119901K|LUN

KOP lnQsum 119908LTLOP 119901K|LU ge minussum 119908L

TLOP sum 119901K|L ln 119901K|LN

KOP

This proves 119867W ge 119867 The Jensen inequality become equality if and only if 119901K|P = 119901K|S =⋯ = 119901K|T for any allele i = 1 2 hellip S ie all J populations have identical allele frequency distributions The maximum value of 119867W minus 119867 is obtained as follows

119867W = minuscdc119908L119901K|L

T

LOP

lndc119908L119901K|[

T

[OP

eeN

KOP

le minuscdc119908L119901K|L ln119908L119901K|L

T

LOP

eN

KOP

= minussum 119908L sum 119901K|L ln 119901K|LNKOP

TLOP minus sum 119908L sum 119901K|L ln119908LN

KOPTLOP = 119867 minus sum 119908L ln119908L

TLOP

When all J populations are completely distinct (no shared alleles) the above inequality becomes an equality The proof is thus completed

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 22: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

4

From the above theorem the normalized differentiation measure (Shannon

differentiation) is formulated as Δg =

hijhkjsum lm nolm

pmqr

(C1)

This measure takes the minimum value of 0 when all populations have identical allele frequency distributions and it takes the maximum value of 1 when the J populations are completely distinct (no shared alleles) Shannon differentiation measure satisfies two monotonicity properties (stated in the first part of the following theorem) that heterozygosity-based measures lack Theorem S32 Desirable monotonicity and ldquotrue dissimilarityrdquo properties for Shannon differentiation (A) Shannon differentiation (given in Eq C1) satisfies the following monotonicity properties

that heterozygosity-based measures lack (A1) Shannon differentiation always increases when some copies of an allele that is shared

between two or more populations are replaced by copies of an unshared allele (A2) Shannon differentiation measure is always non-decreasing when a new allele is added

to a single population with any abundance Here the population weights can be a set of specified weights or relative population sizes

(B) In addition Shannon differentiation also satisfies the ldquotrue dissimilarityrdquo property If multiple communities each have S equally common species with exactly A species shared by all of them and with the remaining species in each community not shared with any other community then Shannon differentiation measure gives 1-AS the true proportion of non-shared species in a community

Proof For Part (A) see Appendix S6 in Chao Jost et al (2015) for proof details and counter-examples for Part (B) see Chao and Chiu (2016) Shannon differentiation measures in three-level hierarchy

Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text From Table 3 of the main text Shannon gamma and alpha entropies are expressed as

119867W = minussum 119901K|(( ln 119901K|((NKOP 119867

(S) = minussum 119908(s sum 119901K|(s ln119901K|(sNKOP

tsOP

119867(P) = minussum sum 119908Ls sum 119901K|Ls ln 119901K|LsN

KOPTuLOP

tsOP

(See Appendix B for all notation) The well-known additive decomposition for Shannon entropy is

119867W = 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x (C2)

Here 119867(P) denotes the within-population information w119867

(S) minus 119867(P)x denotes the

among-population information within a region and w119867W minus 119867(S)x denotes the

among-region information In the following theorem the maximum value for each of the

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 23: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

5

latter two components is derived so that we can obtain the corresponding normalized differentiation measures Theorem S33 For the decomposition given in Eq (C2) in three-level hierarchy we have the following inequalities

0 le 119867W minus 119867(S) le minussum 119908(s ln119908(st

sOP (C3) 0 le 119867

(S) minus 119867(P) le minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C4)

When all populations have identical allele relative frequency distributions we have 119867W =119867(S) = 119867

(P) When all populations are completely distinct (no shared alleles) we have 119867

(S) minus 119867(P) = minussum sum 119908Ls ln

lmulyu

TuLOP

tsOP and 119867W minus 119867

(S) = minussum 119908(s ln119908(stsOP

Proof To prove Eq (C3) we consider the following two-level (ecosystemregion) hierarchy there are K regions with allele relative frequency 119901K|(s for species i in region k with the region weights Q119908(P 119908(S⋯ 119908(TU Note that we can express 119901K|(( as

119901K|(( = sum sum 119908Ls119901K|LsTuLOP

tsOP = sum 119908(s119901K|(st

sOP Then the ldquogammardquo entropy for this two-level system is

119867W = minussum Qsum 119908(s119901K|(slnQsum 119908([119901K|([t[OP Ut

sOP UNKOP

The corresponding ldquoalphardquo entropy for this two-level system is minussum 119908(s sum 119901K|(s ln 119901K|(sN

zOPtsOP which is 119867

(S) Eq (C3) then follows directly from Theorem C1

To prove Eq (C4) we consider the following two-level hierarchy (region k and all populations within region k) there are Jk populations with allele relative frequency 119901K|Ls for

allele i in population j with population weights lrulyu

l|ulyu

⋯ lpuulyu

119895 = 12⋯ 119869s The

ldquogammardquo entropy for this two-level hierarchy is the entropy value for region k ie 119867s(S) =

minussum 119901K|(s ln 119901K|(sNKOP where 119901K|(s = sum lmu

lyu119901K|Ls

TuLOP The corresponding ldquoalphardquo entropy is

sum lmulyu

119867Ls(P)Tu

LOP = minussum lmulyu

sum 119901K|LsNKOP ln 119901K|Ls

TuLOP Then Theorem C1 leads to

119867s(S) minusc

119908Ls119908(s

119867Ls(P)

Tu

LOP

= minusc119901K|(s ln 119901K|(s

N

KOP

+c119908Ls119908(s

c119901K|Ls

N

KOP

ln 119901K|Ls

Tu

LOP

le minusc119908Ls119908(s

ln119908Ls119908(s

Tu

LOP

Summing over k with weight 119908(sin both sides of the above inequality we obtain

minussum 119908(s sum 119901K|(s ln 119901K|(sNKOP

tsOP + sum sum 119908Ls sum 119901K|LsN

KOP ln 119901K|LsTuLOP

tsOP = 119867

(S) minus 119867(P) le

minussum 119908Ls lnlmulyu

TuLOP

This proves Eq (C4) From the above theorem we have 0 le 119867

(P) le 119867(S) le 119867W ie the gamma diversity of

any level is greater than or equal to the corresponding alpha diversity at the same level Eq (C3) leads to the following normalized differentiation measure among regions

Δg(S) = hijhk

(|)

jsum lyu nolyuuqr

(C5)

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 24: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

6

Likewise Eq (C4) leads to the following normalized differentiation measure among populations within a region

Δg(P) = hk

(|)jhk(r)

jsum sum lmu noQlmulyuUpumqr

uqr

(C6)

Each of the two differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are completely distinct (no shared species) Note that in the latter case we can decompose the gamma diversity as

119867W = minuscdcc119908Ls119901K|Ls lnQ119908Ls119901K|LsUTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls + ln119908Ls119908(s

+ ln119908(sTu

LOP

t

sOP

eN

KOP

= minuscdcc119908Ls119901K|Ls ln 119901K|Ls

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908Ls119908(s

Tu

LOP

t

sOP

eN

KOP

minuscdcc119908Ls119901K|Ls ln119908(s

Tu

LOP

t

sOP

eN

KOP

= 119867(P) minuscc119908Ls ln

119908Ls119908(s

Tu

LOP

t

sOP

minusc119908(s ln119908(s

t

sOP

equiv 119867(P) + w119867

(S) minus 119867(P)x + w119867W minus 119867

(S)x

In this special case we have 119867(S) minus 119867

(P) = minussum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119867W minus 119867

(S) =minussum 119908(s ln119908(st

sOP

As we proved in Theorem C3 each differentiation measure can be regarded as based on a two-level hierarchy Thus each differentiation satisfies the corresponding monotonicity and ldquotrue dissimilarityrdquo properties stated in Theorem C2 Phylogenetic differentiation measures in three-level hierarchy

For phylogenetic differentiation measures based on ultrametric trees all derivation steps are parallel to those of allelic diversity Consider the three-level hierarchy (ecosystem-region-population) in Table 1 of the main text Shannon gamma and alpha entropies are expressed as (Table 3 of the main text)

119868W = minussum 119871K119886K|(( ln 119886K|((KOP 119868

(S) = minussum 119908(s sum 119871K119886K|(s ln 119886K|(sKOP

tsOP

119868(P) = minussum sum 119908Ls sum 119871K119901K|Ls ln119901K|Ls

KOPTuLOP

tsOP

Corresponding to Eq (C2) we have a similar decomposition for phylogenetic entropy

119868W = 119868(P) + w119868

(S) minus 119868(P)x + w119868W minus 119868

(S)x (C7)

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 25: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

7

Here 119868(P) denotes the within-population phylogenetic information w119868

(S) minus 119868(P)x denotes

the among-population phylogenetic information within a region and w119868W minus 119868(S)x denotes

the among-region phylogenetic information The maximum value for each of the latter two components is derived in the following theorem so that we can obtain the corresponding differentiation measures Theorem C4 For the decomposition given in Eq (C7) in the three-level hierarchy we have the following inequalities under an ultrametric tree with depth T

0 le 119868W minus 119868(S) le minus119879sum 119908(s ln119908(st

sOP (C8) 0 le 119868

(S) minus 119868(P) le minus119879sum sum 119908Ls ln

lmulyu

TuLOP

tsOP (C9)

When all populations have identical allele relative frequency distributions we have 119868W =119868(S) = 119868

(P) When all populations are completely distinct phylogenetically (no shared branches across populations though branches within a population may be shared) we have 119868(S) minus 119868

(P) = minus119879sum sum 119908Ls lnlmulyu

TuLOP

tsOP and 119868W minus 119868

(S) = minus119879sum 119908(s ln119908(stsOP

The above theorem leads to the two normalized phylogenetic differentiation measures (given in Table 3 of the main text) in the range [0 1] Each of the two phylogenetic differentiation measures takes the minimum value of 0 when all populations have identical allele relative frequency distributions and it takes the maximum value of 1 when all populations are phylogenetically completely distinct All the derivation procedures as well as the monotonicity and ldquotrue dissimilarityrdquo properties are parallel to those of allelic diversity and thus are omitted

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 26: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

1

SUPPLEMENTARYINFORMATION

FigureS1SchematicrepresentationofthecalculationofdiversitiesateachlevelofthehierarchyThefivegreenrectanglesrepresentlocalpopulationsthebluerepresentregionsandtheredrepresenttheecosystemShannon

entropies 119867lowast arecalculatedfromallelespeciesabundancesateach

levelhofthehierarchyandarethentransformedintoeffectivenumbersusingeq1

Within Between Total Decomposition

3 Ecosystem minus minus

2 Region

1 Population or Community

pi|+1 pi|+2

pi|++

Ha1(2)

Ha2(2)

Ha11(1)

Ha21(1)

Ha31(1)

Ha42(1)

Ha52(1)

Ha(1)

Ha(2)Hg

pi|11 pi|31pi|21 pi|42 pi|52

= amp ( ) = +

= amp (

=

= amp (

) = + =

= ) )

= )

= )

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 27: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

2

FigureS2Exampleofanultrametrictreewheretheterminalnodesrepresentspeciesalleleswithassociatedabundancesandtheinteriornodes

representingspeciationcoalescenteventsInthiscasethecalculationofeffectivenumbersisbasedonandextendedsetwherethefirstelements(inthepresentexample5)correspondtospeciesallelesabundancesandall

otherabundancescorrespondtotheabundanceoftheelementsdescendedfromtheinternalnodesInthepresentexamplethesetofabundancesfor8nodesisasfollows 119886119886⋯ 119886119886119886119886 = 119901119901⋯ 119901 119901 + 119901 119901 +

119901 + 119901 119901 + 119901 where 119901 istherelativeabundanceofspeciesi

p1 p3p2 p4 p5

p1 + p2

p4 + p5

p1 + p2 + p3

MRCA

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 28: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

3

RcodeforInformation-basedDiversityPartitioning(iDIP)UnderMulti-LevelHierarchicalStructuresforSpeciesandPhylogeneticDiversities

SpeciesAllelicDiversity(RFunctioniDIP) Inputshouldincludetwodatamatrices(calledldquoAbunrdquoandldquoStrucrdquorespectively)(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineach

populationcommunity(columns)iDIPcannothandleldquoblankrdquoorldquoNArdquoentry

YoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationoracommunity

(b) Strucspecifyingahierarchicalstructurematrixseeasimpleexamplebelow OurRcodecanbeappliedtoanynumberoflevelsForsimplicitywejustusea

three-levelhierarchicalstructuretoillustratehowtoinputdataConsidertherearetworegions(1and2)inanecosystemInRegion1therearetwopopulationsandinRegion2therearethreepopulationsThehierarchicalstructureis

displayedasthefollowing

Supposetherawallelefrequenciesaregiveninthefollowingmatrix(allelesin

rowsandpopulationsincolumns)

Ecosystem13

Region113

Population113 Population213

Region213

Population313 Population413 Population513

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 29: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

4

Pop1 Pop2 Pop3 Pop4 Pop5

Allele1 1 16 2 10 15

Allele2 0 0 0 5 14

Allele3 7 12 11 1 0

Allele4 0 5 14 1 21

Allele5 2 1 0 11 10

Allele6 0 1 3 2 0

Thehierarchicalstructurematrixforthissimpleexampleshouldbeinputasamatrixwithlevelsinrowsandpopulationsincolumns(Level1=populationlevelLevel2=regionlevelLevel3=ecosystemlevel)Hierarchicalstructureof

anynumberoflevelscanbeexpressedinasimilarmanner

Pop1 Pop2 Pop3 Pop4 Pop5

Level3 Ecosystem Ecosystem Ecosystem Ecosystem Ecosystem

Level2 Region1 Region1 Region2 Region2 Region2

Level1 Population1 Population2 Population3 Population4 Population5

FortheabovesimpleexampleinputdataforRfunctioniDIP(codegivenbelow)

areshownbelow InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(151

4021100))Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Population15))

RunRfunction iDIP(DataStruc)

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 30: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

5

Output is given below

[1]

D_gamma 5272

D_alpha2 4679

D_alpha1 3540

D_beta2 1127

D_beta1 1322

Proportion2 0300

Proportion1 0700

Differentiation2 0204

Differentiation1 0310

We give simple interpretations for the above output (all the ldquoeffectiverdquo is in asenseofequallyabundantallelespopulationsregions)(1a) D_gamma = 5272 is interpreted as that the effective number of alleles in the

ecosystem (total diversity) is 5272

(1b) D_alpha2 = 4679 is interpreted as that each region contains 4679 allele

equivalents

D-beta2 = 1127 implies that there are 113 region equivalents Thus 4679 x

1127 = 5272 (=D_gamma)

(1c) D_alpha1 = 3540 is interpreted as that each population within a region contains

3540 allele equivalents

D_beta1 =1322 is interpreted as that there are 132 population equivalents per

region

Here 1322 x 3540 = 4679 species per region (= D_alpha2)

(2) Proportion2 = 030 means that the proportion of total beta information found at

the regional level is 30

Proportion1 = 070 means that the proportion of total beta information found at

the population level is 70

(3) Differentiation2 =0204 implies that the mean differentiationdissimilarity among

regions is 0204 This can be interpreted as the following effective sense the mean

proportion of non-shared alleles in a region is around 204

Differentiation1 =0310 implies that the mean differentiationdissimilarity among

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 31: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

6

populations within a region is 031 ie the mean proportion of non-shared alleles

in a population is around 310

RcodeforInformation-basedDiversityPartitioningforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesby communityfrequencymatrixdata) struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrix)output

(1)Gamma(ortotal)diversityalphaandbetadiversityateachlevel (2)Proportionoftotalbetainformationfoundateachlevel(3)Differentiation(dissimilarity)foreachlevelForexampleDifferentiation1

measuresthemeandissimilarityamongpopulations(level1)withinaregionDifferentiation2measuresthemeandissimilarityamongregions(level2)etc

NOTEiDIPcannothandleldquoblankrdquodataorldquoNArdquoentryYoumustreplaceldquoblankrdquo orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast onespeciesalleleinapopulationoracommunity

MainfunctioniDIP

iDIP=function(abunstruc) n=sum(abun)N=ncol(abun) ga=rowSums(abun)

gp=ga[gagt0]n G=sum(-gplog(gp)) S=length(gp)

H=nrow(struc) A=numeric(H-1)W=numeric(H-1)B=numeric(H-1)

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 32: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

7

Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)abun[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0]))) A[H-1]=sum(wiAi) if(Hgt2)

for(iin2(H-1)) I=unique(struc[i])NN=length(I) ai=matrix(0ncol=NNnrow=nrow(abun))

for(jin1NN) II=which(struc[i]==I[j]) if(length(II)==1)ai[j]=abun[II]

elseai[j]=rowSums(abun[II]) pi=sapply(1NNfunction(k)ai[k]sum(ai[k]))

wi=colSums(ai)sum(ai) W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pi[k][pi[k]gt0]log(pi[k][pi[k]gt0])))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2) for(iin2(H-1))

Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1]) Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 33: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

8

out=matrix(c(GammaAlphaBPropDiff)ncol=1) rownames(out)lt-c(paste0(D_gamma)

paste0(D_alpha(H-1)1) paste0(D_beta(H-1)1) paste0(Proportion(H-1)1)

paste0(Differentiation(H-1)1) ) return(out)

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 34: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

9

PhylogeneticDiversity(RfunctioniDIPphylo)

Inadditiontothetwodatamatrices(calledldquoAbunrdquoldquoStrucrdquorespectively)asdescribedinthespeciesdiversitywealsoneedtoinputaphylogeneticldquoTreerdquoinNewicktreeformat)

(a) Abunspecifyingspeciesalleles(rows)raworrelativefrequenciesineachpopulationcommunity(columns) NOTEspeciesnamesintheldquoAbunrdquomatrixshouldbeexactlythesameas

thoseintheuploadedNewicktreeformatiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceldquoblankrdquoorldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleastonespeciesalleleinapopulationora

community(b) Strucspecifyinghierarchicalstructurematrixseethesimpleexamplegiven

aboveforthespeciesdiversity

(c) Treeaphylogenetictreespannedbyallspeciesconsideredinthestudy

Hereweusethesamehierarchicalstructureandalleleabundancesdataasinthe

speciesdiversityforillustrationAsimulatedphylogenetictreefor6speciesaregivenbelow

InputdataData=cbind(c(107020)c(16012511)c(20111403)c(10511112)c(1514021100))

rownames(Data)=paste0(Allele16)Struc=rbind(rep(Ecosystem5)c(rep(Region12)rep(Region23))paste0(Community15))

Tree=c((((Allele11666254448Allele22886156926)4370264926Allele35919367445)4349065302(Allele4967060281Allele54965919121Allele615361314)5492297125))

RunRfunction iDIPphylo(DataStrucTree)

Output is given below

[1]

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 35: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

10

Faiths PD 321525

mean_T 94169

PD_gamma 274388

PD_alpha2 255194

PD_alpha1 223231

PD_beta2 1075

PD_beta1 1143

PD_prop2 0351

PD_prop1 0649

PD_diff2 0124

PD_diff1 0149

Theldquoeffectiverdquointhefollowinginterpretationisinthesenseofequallyabundantandequallydivergentlineagescommunitiesregions

(1)Thetotalbranchlength(FaithrsquosPD)inthephylogenetictreeis321525 (2)Theweighted(byspeciesabundance)meanofthedistancesfromrootnode

toeachofthetipsis94169

(3a)PD_gamma=274388 is interpreted as that the effective total branch length in

the ecosystem (total phylogenetic diversity) is 274388(3b)PD_alpha2=255194is interpreted as that the effective total branch length per

region is 255194

PD-beta2 = 1075 means that there are 108 region equivalents Thus 255194x1075=274388(=PD_gamma)

(3c)PD_alpha1=223231 is interpreted as that the effective total branch length per

population within each region is 223231PD_beta1 =1143 implies that there are 114 population equivalents per region

Here223231 x1143=255194(=PD_alpha2)

(4) PD_prop2=0351meansthattheproportionoftotalphylogeneticbetainformationfoundintheregionallevelis351

PD_prop1=0649meansthattheproportionoftotalphylogeneticbetainformationfoundinthecommunitylevelis649

(5) PD_diff2=0124impliesthatthemeanphylogeneticdifferentiationamong

regionsis0124Thiscanbeinterpretedasthefollowingeffectivesensethemeanproportionofnon-sharedlineagesinaregionisaround125

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 36: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

11

PD_diff1=0149impliesthatthemeanphylogeneticdifferentiationamongcommunitieswithinaregionis0149iethemeanproportionofnon-shared

lineagesinacommunityisaround149

RcodeforPhylogenetic-Information-BasedDecompositionforAnyNumberofLevelsinputdata

abunallelesbypopulationfrequencymatrixdata(orspeciesbycommunity frequencymatrixdata)struclevelbypopulationhierarchicalstructurematrix(orlevelbycommunity

hierarchicalstructurematrixtreeaNewick-formatphylogenetictreespannedbyallfocalspeciesconsidered inastudy

NOTEiDIPcannothandleldquoblankrdquoorldquoNArdquoentryYoumustreplaceblank orldquoNArdquoinyourdataby0oranynumericalvalueAlsotheremustbeatleast

onespeciesalleleinapopulationoracommunityoutput

(1)FaithrsquosPDthetotalsumofbranchlengthsofaphylogenetictree(2)meanTweighted(byspeciesabundance)meanofthedistancesfromrootnodetoeachofthetipsinaphylogenetictreeForanultrametrictree

meanT=treedepth(3)Gamma(ortotal)phylogeneticdiversity(PD)oforder1alphaandbetaPDforeachlevel

(4)PD_Prop1andPD_prop2measuretheproportionsoftotalphylogeneticbetainformationfoundinLevel1andLevel2respectively(5)Phylogeneticdifferentiation(dissimilarity)foreachlevelForexample

PD_diff1measures(level-1)themeanphylogeneticdissimilarityamongcommunities(Level1)withinaregion(Level2)PD_diff2measuresthemeanphylogeneticdissimilarityamongregions(Level2)etc

Threepackagesldquoade4rdquoldquoaperdquoandldquophytoolsrdquomustbeinstalledfirst

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 37: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

12

installpackages(ade4)library(ade4)

installpackages(ape)library(ape)installpackages(phytools)

library(phytools)MainfunctioniDIPphylo

iDIPphylo=function(abunstructree) phyloDatalt-newick2phylog(tree)

Templt-asmatrix(abun[names(phyloData$leaves)]) nodenames=c(names(phyloData$leaves)names(phyloData$nodes))

M=matrix(0nrow=length(phyloData$leaves)ncol=length(nodenames)dimnames=list(names(phyloData$leaves)nodenames))

for(iin1length(phyloData$leaves))M[i][unlist(phyloData$paths[i])]=rep(1length(unl

ist(phyloData$paths[i])))

pA=matrix(0ncol=ncol(abun)nrow=length(nodenames)dimnames=list(nodenamescolnames(abun))) for(iin1ncol(abun))pA[i]=Temp[i]M

pB=c(phyloData$leavesphyloData$nodes) n=sum(abun)N=ncol(abun)

ga=rowSums(pA) gp=ganTT=sum(gppB) G=sum(-pB[gpgt0]gp[gpgt0]TTlog(gp[gpgt0]TT))

PD=sum(pB[gpgt0])

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 38: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

13

H=nrow(struc)

A=numeric(H-1)W=numeric(H-1)B=numeric(H-1) Diff=numeric(H-1)Prop=numeric(H-1)

wi=colSums(abun)n W[H-1]=-sum(wi[wigt0]log(wi[wigt0])) pi=sapply(1Nfunction(k)pA[k]sum(abun[k]))

Ai=sapply(1Nfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT))) A[H-1]=sum(wiAi)

if(Hgt2) for(iin2(H-1))

I=unique(struc[i])NN=length(I) pi=matrix(0ncol=NNnrow=nrow(pA))ni=numeric(NN) for(jin1NN)

II=which(struc[i]==I[j]) if(length(II)==1)pi[j]=pA[II]sum(abun[II])ni[j]=sum(abun[II]) elsepi[j]=rowSums(pA[II])sum(abun[II])ni[j]=sum(abun[II])

pi=sapply(1NNfunction(k)ai[k]sum(ai[k])) wi=nisum(ni)

W[i-1]=-sum(wilog(wi)) Ai=sapply(1NNfunction(k)-sum(pB[pi[k]gt0]pi[k][pi[k]gt0]TTlog(pi[k][pi[k]gt0]TT)))

A[i-1]=sum(wiAi)

total=G-A[H-1] Diff[1]=(G-A[1])W[1] Prop[1]=(G-A[1])total

B[1]=exp(G)exp(A[1]) if(Hgt2)

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)

Page 39: Diversity from genes to ecosystems: A unifying framework ...chao.stat.nthu.edu.tw › wordpress › paper › 128_pdf_appendix.pdf · the other hand, population genetics was initially

14

for(iin2(H-1)) Diff[i]=(A[i-1]-A[i])(W[i]-W[i-1])

Prop[i]=(A[i-1]-A[i])total B[i]=exp(A[i-1])exp(A[i])

Gamma=exp(G)TTAlpha=exp(A)TTDiff=DiffProp=Prop Gamma=exp(G)Alpha=exp(A)Diff=DiffProp=Prop

out=matrix(c(PDTTGammaAlphaBPropDiff)ncol=1) out1=iDIP(abunstruc) out=cbind(out1out2)

rownames(out)lt-c(paste(FaithsPD) paste(mean_T)

paste0(PD_gamma) paste0(PD_alpha(H-1)1) paste0(PD_beta(H-1)1)

paste0(PD_prop(H-1)1) paste0(PD_diff(H-1)1) )

return(out)