38
Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K., Ugland, K. I., Borges, P. A. V., Rigal, F., Cardoso, P. and Whittaker, R. J. 2014. The gambin model provides a superior fit to species abundance distributions with a single free parameter: evidence, implementation and interpretation. – Ecography doi: 10.1111/ecog.00861 Supplementary material

Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

Ecography ECOG-00861Matthews, T. J., Borregaard, M. K., Ugland, K. I., Borges, P. A. V., Rigal, F., Cardoso, P. and Whittaker, R. J. 2014. The gambin model provides a superior fit to species abundance distributions with a single free parameter: evidence, implementation and interpretation. – Ecography doi: 10.1111/ecog.00861

Supplementary material

Page 2: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

1

Appendix 1. The maximum likelihood estimation of the gambin α parameter, and examples of use of the gambin R package.

The maximum likelihood estimation of the gambin α parameter

In order to fit the gambin model, the gamma distribution (i.e. the fitness frequencies) needs to be linked to the actual population sizes. In this context fitness refers to the ecological success of a species, as measured by the chance of achieving a large average population size. Since the extreme species abundance curves are the logseries and lognormal curves, it is convenient to regard the population sizes on a log-scale, i.e. the discrete octaves 0, 1, 2, 3, … where octave 0 relates to 1 individual, octave 1 relates to 2 or 3 individuals, octave 2 relates to 4, 5, 6 or 7 individuals, and so on. By way of an example, let us assume that the most abundant species in the sample is represented by 6719 individuals. Since the 12th octave is the range of individuals from 4096 to 8191, then the most abundant species in the sample is represented by the number 12. Thus, all species in the considered sample are represented by one of the 13 numbers 0, 1, 2, …, 12. In order to take into account abiotic and biotic stochasticity, the assignment of a species’ abundance in the 13 available octaves is determined through a binomial process. The problem can then be viewed as linking the gamma distribution with the probability of success in a binomial process with 12 trials.

Since the abundance of any species is always a finite number, it is possible to establish an easy procedure for deducing the probability density for the binomial probabilities by simply cutting the gamma distribution at the 99% percentile. That is, for any value of the shape parameter α (i.e. the community parameter), let c99(α) be the number such that 99% of the density is covered:

99( )1

0

1 0.99( )

cxx e dx

αα

α− − =

Γ∫

We then subdivide the interval [0, c99(α)] into 100 equally long successive pieces, and assume that the fitness of the species takes one of the 100 values p = pj = j/100 (j = 1, 2, …, 100) with probability

99( ) /1001

( 1) 99( ) /100

1( ) / 0.99( )

jcx

jj c

G x e dxα

α

α

αα

− −

=Γ∫

Note that this function defines the link between the shape parameter α and the probability that the probability of success achieves the value p = pj = j/100 (j = 1, 2, …, 100). If the number of observed species in the samples is denoted SOBS, and the number of species with fitness pj = j/100 is SOBS*Gj(α), then the number of species being represented with an abundance in the kth octave is

121̀2* ( )* 1

100 100

k k

OBS jj jS G

−⎛ ⎞⎛ ⎞ ⎛ ⎞−⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠⎝ ⎠ for k = 0, 1, 2, …, 12.

Page 3: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

2

Since species with different fitness frequencies may have population sizes in the same geometric class, the predicted number of species in any octave k is obtained as the sum of the contribution from all possible fitness frequencies:

ˆ * ( )k OBS kS S P α= , where

12100

1

1̀2( ) ( )* 1

100 100

k k

k jj

j jP Gk

α α−

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

∑ for k = 0, 1, 2, …, 12

Note that 1 2 12ˆ ˆ ˆ

OBSS S S S= + ⋅⋅⋅+ where the estimated frequency in each case is determined by only the shape parameter α (since the scale parameter is fixed at 1.0). The unknown parameter may therefore be estimated by maximizing the likelihood function of the sample

1 2 121 2 12( ) ( )* ( )*** ( )S S SL P P Pα α α α=

The maximum likelihood estimator of the GAMBIN parameter α is the value which maximizes the likelihood function L (α).

Examples of the use of the gambin package:

data = a vector of abundances of a set of species

# Fit the gambin distribution

>fit <- fitGambin(data)

#To obtain α and the associated 95% confidence intervals

> fit$Alpha

> confint(fit)

# Plot the distribution

> plot(fit)

# Obtain the the chi-square test statistic, degrees of freedom and associated P value

ˆMLα

Page 4: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

3

> summary(fit)$Chisq

#call the maximum log likelihood value and calculate the various information criterion

> logLik(fit)

>BIC(fit)

>AICc(fit)

#fit the gambin distribution and set ‘subsample’ to the number of individuals in the least

abundant sample (the default value is zero)

>fit <- fitGambin(data, subsample)

Page 5: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

4

Appendix 2. Information regarding the Azorean data and sourced datasets. Requests for data should be directed to PAVB.

Table A1. Site information for the 18 samples from fragments of native Laurisilva forest from the Azores used in this study. For each fragment the area of the fragment, island in which the fragment is located, and the number of arthropod individuals and number of species sampled are given. Each fragment sample was created by pooling the individuals sampled along four transects in each fragment. Arthropods were sampled using a standardised pitfall trap and canopy beating methodology between 1999 and 2004.

Fragment Area (Ha)

Island Number of Individuals

Number of Species

1. Cabeco do Fogo 36 Faial 5586 92 2. Caldeira do Faial 191 Faial 3364 84 3. Lagoas Funda e Rasa 240 Flores 5989 117 4. Morro Alto 1331 Flores 3107 89 5. Caveiro 184 Pico 5578 96 6. Lagoa do Caiado 79 Pico 5766 91 7. Mistério da Prainha 689 Pico 9943 114 8. Pico Pinheiro 74 São Jorge 7366 107 9. Topo 220 São Jorge 4939 102 10. Atalhada 10 São Miguel 3990 130 11. Graminhais 16 São Miguel 2925 84 12. Pico da Vara 306 São Miguel 6204 118 13. Pico Alto 9 Santa Maria 7005 141 14. Biscoito da Ferraria 557 Terceira 5226 116 15. Caldeira Guilherme Moniz 223 Terceira 6027 89 16. Pico Galhardo 38 Terceira 4608 111 17. Serra de Santa Barbara 1347 Terceira 971 68 18. Terra Brava 180 Terceira 4216 103

Page 6: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

5

Table A2. Site information for the 6 Azorean island samples used in this study. Island samples were created by pooling the data from the fragments of native Laurisilva forest located on each island (see Table A1). For each island the combined area of the pooled fragments, number of pooled fragments, and the number of arthropod individuals and species sampled is given. Arthropods were sampled using a standardised pitfall trap and canopy beating methodology between 1999 and 2004. The island of Santa Maria was not included in the island level analysis as it only possesses a single sampled fragment. The minimum number of individuals in a transect, used to standardise transects in our land use comparative analysis is also provided for the three islands (Min. N).

Table A3. Ecological information for the arthropod species used in this study, including species name and associated class, mean body size, origin status, ecological guild, and level of dispersal ability. Origin status refers to whether a species is endemic to the Azores (E), introduced (I), or native but non-endemic (N). Guild: P = predator, H = herbivore, F= Fungivorous, S= Saprophagous. All species presented are arthropods from the Azores and were sampled in the native Laurisilva forest using a standardised pitfall trap and canopy beating methodology between 1999 and 2004.

Class Species Mean Body Size (mm) Origin Guild Dispersal

Ability Arachnida Acorigone acoreensis 1.30 E P High

Arachnida Acorigone zebraneus 1.33 E P High

Arachnida Agyneta decora 2.75 I P High

Arachnida Agyneta rugosa 1.75 E P High

Arachnida Agyneta sp.1 1.26 ? P High

Arachnida Araneus sp.1 4.32 I P Low

Island Number of Fragments

Combined Area (Ha)

Number of Species

Total Number of Individuals

Min. N

1. Faial 2 227 125 8942 2313 2. Flores 2 1571 141 9096 1018 3. Pico 3 952 153 21287 NA 4. São Jorge 2 294 145 12305 NA 5. São Miguel 2 316 178 10194 NA 6. Terceira 4 1788 165 15801 1736

Page 7: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

6

Arachnida Cheiracanthium erraticum 6.63 I P Low

Arachnida Cheiracanthium floresense 8.15 E P Low

Arachnida Cheiracanthium jorgeense 6.25 E P Low

Arachnida Chthonius ischnocheles 2.00 I P Low

Arachnida Chthonius tetrachelatus 1.61 I P Low

Arachnida Clubiona decora 6.00 N P Low

Arachnida Clubiona terrestris 6.00 I P Low

Arachnida Cryptachaea blattea 3.00 I P Low

Arachnida Dysdera crocata 11.25 I P Low

Arachnida Emblyna acoreensis 2.40 E P Low

Arachnida Eperigone sp.1 1.81 I P High

Arachnida Erigone atra 2.33 I P High

Arachnida Erigone autumnalis 2.10 I P High

Arachnida Erigone dentipalpis 2.30 I P High

Arachnida Erigone sp.2 Unknown Unknown P Low

Arachnida Ero furcata 2.75 I P Low

Arachnida Gen. sp.1 Unknown Unknown P Low

Arachnida Gen. sp.2 Unknown Unknown P High

Arachnida Gibbaranea occidentalis 5.50 E P High

Arachnida Homalenotus coriaceus 3.82 N P Low

Arachnida Lasaeola oceanica 2.75 E P Low

Arachnida Lathys dentichelis 2.13 N P Low

Arachnida Leiobunum blackwalli 6.00 N P High

Arachnida Lepthyphantes junipericola 4.00 E P Low

Arachnida Lepthyphantes relictus 3.55 E P Low

Arachnida Lepthyphantesacoreensis 2.58 E P High

Page 8: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

7

Arachnida Lessertia dentichelis 3.05 I P High

Arachnida Macaroeris cata 5.13 N P Low

Arachnida Mangora acalypha 3.25 I P Low

Arachnida Meioneta depigmentata 1.40 E P High

Arachnida Meioneta fuscipalpa 1.90 I P High

Arachnida Mermessus bryantae 2.10 I P High

Arachnida Mermessus fradeorum 2.10 I P High

Arachnida Mermessus trilobatus 2.10 I P High

Arachnida Metellina merianae 6.63 I P Low

Arachnida Microlinyphia johnsoni 3.75 I P High

Arachnida Minicia floresensis 1.98 E P High

Arachnida Neobisium maroccanum 3.00 N P Low

Arachnida Neon acoreensis 1.75 E P Low

Arachnida Neriene clathrata 4.23 I P High

Arachnida Nigma puella 2.56 I P Low

Arachnida Oecobius navus 2.25 I P Low

Arachnida Oedothorax fuscus 2.38 I P High

Arachnida Orchestina furcillata 1.00 E P Low

Arachnida Palliduphantes schmitzi 1.95 N P High

Arachnida Pardosa acorensis 5.50 E P High

Arachnida Pelecopsis parallela 1.50 I P High

Arachnida Pisaura acoreensis 10.50 E P Low

Arachnida Porrhomma borgesi 1.88 E P Low

Arachnida Prinerigone vagans 2.10 I P High

Arachnida Pseudeuophrys vafra 3.75 I P High

Arachnida Rhomphaea nasica 4.00 I P Low

Arachnida Rugathodes acoreensis 1.70 E P High

Arachnida Sancus acoreensis 3.45 E P High

Arachnida Savigniorrhipis acoreensis 1.88 E P High

Page 9: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

8

Arachnida Savigniorrhipis topographicus 1.63 E P Low

Arachnida Steatoda grossa 6.63 I P Low

Arachnida Tenuiphantes miguelensis 2.34 N P High

Arachnida Tenuiphantes tenuis 2.48 I P High

Arachnida Theridion musivivum 3.50 N P Low

Arachnida Walckenaeria grandis 2.10 E P High

Arachnida Xysticus cor 6.00 N P Low

Arachnida Xysticus nubilus 6.00 I P Low

Arachnida Zodarion atlanticum 2.79 I P Low

Chilopoda Cryptops hortensis 15.39 N P Low

Chilopoda Geophilus truncorum 13.78 N P Low

Chilopoda Lithobius pilicornis 21.83 N P Low

Chilopoda Strigamia crassipes 18.19 N P Low

Diplopoda Blaniulus guttullatus 6.83 I S Low

Diplopoda Brachydesmus superus 6.62 I S Low

Diplopoda Brachyiulus pusillus 7.53 I S Low

Diplopoda Brachyiulus sp.1 4.50 I S Low

Diplopoda Choneiulus palmatus 10.43 I S Low

Diplopoda Cylindroiulus latestriatus 8.88 I S Low

Diplopoda Cylindroiulus propinquus 37.44 I S Low

Diplopoda Haplobainosoma lusitanum 10.48 I S Low

Diplopoda Nopoiulus kochii 8.68 I S Low

Diplopoda Ommatoiulus moreletii 40.04 I H Low

Diplopoda Oxidus gracilis 16.00 I S Low

Diplopoda Polydesmus coriaceus 25.38 I S Low

Diplopoda Proteroiulus fuscus 11.28 I S Low

Insecta Acalypta parvula 1.86 N H High

Insecta Acizzia uncatoides 1.82 I H High

Page 10: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

9

Insecta Acrotrichis sp.1 1.32 N S High

Insecta Acupalpus dubius 3.00 N P High

Insecta Acupalpus flavicollis 2.61 N P High

Insecta Acyrthosiphon pisum 0.63 N H High

Insecta Aeolothrips gloriosus 1.36 N P High

Insecta Aeolus melliculus moreleti 8.00 I S High

Insecta Agabus godmani 8.62 E P High

Insecta Agrotis sp.1 33.00 N H Low

Insecta Aleochara bipustulata 3.00 I P High

Insecta Alestrus dolosus 5.48 E H High

Insecta Aloconota sulcifrons 3.45 N P High

Insecta Amara aenea 7.00 I P High

Insecta Amischa analis 3.00 I P High

Insecta Amphorophora rubi 2.30 N H High

Insecta Anaspis proteus 2.00 N H High

Insecta Anisodactylus binotatus 11.00 I P High

Insecta Anoecia corni 1.41 I H High

Insecta Anoscopus albifrons 4.00 N H High

Insecta Anotylus nitidifrons 2.00 I P High

Insecta Aphis craccivora 1.40 N H High

Insecta Aphis sp.1 1.52 Unknown H High

Insecta Aphrodes hamiltoni 2.31 E H High

Insecta Apterygothrips canarius 1.43 I H High

Insecta Apterygothrips sp. 1.57 E H High

Insecta Aptinothrips rufus 1.04 I H High

Insecta Argyresthia atlanticella 3.44 E H High

Insecta Ascotis fortunata azorica 9.50 E H Low

Insecta Atheta amicula 2.00 I P High

Page 11: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

10

Insecta Atheta atramentaria 3.00 I P High

Insecta Atheta dryochares 2.20 E P High

Insecta Atheta fungi 3.00 I F High

Insecta Atheta sp.3 2.36 E P High

Insecta Atheta sp.4 2.42 E P High

Insecta Athous pomboi 9.39 E H High

Insecta Atlantocis gillerforsi 1.65 E F Low

Insecta Atlantopsocus adustus 3.42 N S High

Insecta Aulacorthum solani 0.92 N H High

Insecta Beosus maritimus 6.77 N H High

Insecta Bertkauia lucifuga 1.54 N S High

Insecta Blastobasis sp.1 Unknown N H High

Insecta Blastobasis sp.3 3.82 N H High

Insecta Brachmia infuscatella 5.36 E H High

Insecta Brachysteles parvicornis 2.00 N P High

Insecta Buchananiella continua 2.00 I P High

Insecta Cacopsylla pulchella 2.76 I H High

Insecta Calacalles subcarinatus 2.60 E H High

Insecta Calathus lundbladi 9.00 E P Low

Insecta Caloptilia schinella 4.81 I H High

Insecta Campyloneura virgula 2.80 N P High

Insecta Carpelimus corticinus 3.00 N P High

Insecta Carpophilus fumatus 3.65 I S High

Insecta Carpophilus hemipterus 3.22 I S High

Insecta Carpophilus sp.2 3.16 I S High

Insecta Cartodere bifasciata 1.73 I S High

Insecta Cartodere nodifer 2.00 I S High

Insecta Catops coracinus 3.27 N S High

Insecta Caulotrupis parvus 2.30 E H Low

Page 12: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

11

Insecta Cedrorum azoricus azoricus 15.98 E P Low

Insecta Cedrorum azoricus caveirensis 15.54 E P Low

Insecta Cephennium distinctum 0.89 N S High

Insecta Ceratothrips ericae 1.14 N H High

Insecta Cercyon haemorrhoidalis 2.00 I S High

Insecta Cerobasis cf sp.1 1.95 E S High

Insecta Cerobasis n.sp.4 1.83 E S High

Insecta Cerobasis sp.3 1.10 E S High

Insecta Chrysodeixis chalcites 2.84 N H Low

Insecta Cilea silphoides 5.00 I P High

Insecta Cinara juniperi 1.34 N H High

Insecta Cixius azofloresi 5.00 E H High

Insecta Cixius azomariae 4.00 E H High

Insecta Cixius azopifajo azofa 5.00 E H High

Insecta Cixius azopifajo azojo 6.00 E H High

Insecta Cixius azopifajo azopifajo 4.13 E H High

Insecta Cixius azoricus azoricus 5.00 E H High

Insecta Cixius azoricus azoropicoi 5.00 E H High

Insecta Cixius azoterceirae 5.00 E H High

Insecta Cixius insularis 4.61 E H High

Insecta Closterotomus norwegicus 5.95 N H High

Insecta Coccinella undecimpunctata 4.54 I P High

Insecta Coccotrypes carpophagus 2.00 I H High

Insecta Conocephalus chavesi 15.00 E H High

Insecta Cordalia obscura 2.00 I P High

Insecta Covariella aegopodii 1.65 I H High

Insecta Cryptamorpha desjardinsii 4.00 I P High

Page 13: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

12

Insecta Cryptophagus sp.1 1.44 I S High

Insecta Cryptophagus sp.3 1.37 I S High

Insecta Cryptophagus sp.4 1.55 I S High

Insecta Cryptophagus sp.5 1.47 I S High

Insecta Cryptophagus sp.6 1.60 I S High

Insecta Cryptophagus sp.7 1.82 I S High

Insecta Cyclophora azorensis 18.00 E H Low

Insecta Cyclophora pupillaria granti 20.68 E H Low

Insecta Cyphopterum adcendens 5.00 N H High

Insecta Dilta saxicola 12.00 N S Low

Insecta Drouetius borgesi borgesi 10.37 E H Low

Insecta Drouetius borgesi centralis 8.31 E H Low

Insecta Drouetius borgesi sanctmichaelis 8.20 E H Low

Insecta Dryops algiricus 5.16 N H High

Insecta Dryops luridus 4.60 N H High

Insecta Dysaphis plantaginea 1.35 I H High

Insecta Ectopsocus briggsi 2.00 I S High

Insecta Ectopsocus strauchi 1.44 N S High

Insecta Elipsocus azoricus 2.00 E S High

Insecta Elipsocus brincki 2.48 E S High

Insecta Empicoris rubromaculatus 7.00 I P High

Insecta Epuraea biguttata 2.98 I S High

Insecta Euborellia annulipes 14.64 I P Low

Insecta Eudonia luteusalis 7.62 E H High

Insecta Eupteryx azorica 2.00 E H High

Insecta Eurythrips tristis 1.81 I H High

Insecta Forficula auricularia 16.00 I P Low

Page 14: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

13

Insecta Frankliniella sp.1 0.77 N H High

Insecta Gabrius nigritulus 5.61 I P High

Insecta Gen. sp.1 16.00 I H Low

Insecta Gen. sp.2 1.20 N H High

Insecta Gen. sp.3 1.54 Unknown H Low

Insecta Gen. sp.4 1.64 I H Low

Insecta Gen. sp.5 1.57 Unknown S High

Insecta Gen. sp.6 8.43 E H Low

Insecta Gen. sp.7 1.51 N H Low

Insecta Gen. sp.8 2.91 N H High

Insecta Gen. sp.9 9.00 Unknown P Low

Insecta Gen. sp.10 5.17 Unknown S/H? High

Insecta Gen. sp.11 4.25 N P High

Insecta Gen. sp.12 1.98 I H High

Insecta Gen. sp.13 2.30 E H High

Insecta Gen. sp.14 2.56 I H High

Insecta Gen. sp.15 1.32 I H High

Insecta Gen. sp.16 2.56 I S High

Insecta Gen. sp.17 Unknown Unknown H High

Insecta Gen. sp.18 3.00 Unknown S High

Insecta Gen. sp.19 3.42 I P High

Insecta Gen. sp.20 6.94 I H Low

Insecta Gen. sp.21 5.78 N H Low

Insecta Gen. sp.22 0.93 E H High

Insecta Gen. sp.23 11.48 E H Low

Page 15: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

14

Insecta Gen. sp.24 20.87 N H Low

Insecta Gen. sp.25 4.12 N H Low

Insecta Gen. sp.26 3.93 Unknown H High

Insecta Gen. sp.27 1.14 E H High

Insecta Gen. sp.28 22.00 I H Low

Insecta Gen. sp.29 10.00 E H Low

Insecta Gen. sp.30 12.15 N H Low

Insecta Gen. sp.31 7.23 N H Low

Insecta Gen. sp.32 0.48 I H Low

Insecta Gen. sp.33 1.16 N H High

Insecta Gen. sp.34 4.82 Unknown H High

Insecta Gen. sp.35 1.10 Unknown H Low

Insecta Gen. sp.36 1.34 I S High

Insecta Gen. sp.37 2.20 I H Low

Insecta Gen. sp.38 2.00 I H High

Insecta Gen. sp.39 0.97 Unknown P High

Insecta Gen. sp.40 8.57 N H Low

Insecta Gen. sp.41 6.94 N H Low

Insecta Gen. sp.42 0.29 I H Low

Insecta Gen. sp.43 8.23 Unknown H High

Insecta Gen. sp.44 17.48 Unknown H High

Insecta Gen. sp.45 1.18 Unknown H High

Insecta Gen. sp.46 1.70 Unknown H Low

Insecta Gen. sp.47 9.41 I H Low

Page 16: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

15

Insecta Gen. sp.48 10.73 Unknown H Low

Insecta Gen. sp.49 0.53 I H Low

Insecta Gen. sp.50 1.18 Unknown H High

Insecta Gen. sp.51 3.94 Unknown H High

Insecta Gen. sp.52 2.20 Unknown H High

Insecta Gen. sp.53 0.81 I H Low

Insecta Gen. sp.54 0.85 Unknown H High

Insecta Gen. sp.55 1.91 Unknown S High

Insecta Gen. sp.56 3.33 N H High

Insecta Gen. sp.57 11.61 I H Low

Insecta Gen. sp.58 1.08 E H High

Insecta Gen. sp.59 14.24 I H Low

Insecta Gen. sp.60 23.84 N H Low

Insecta Gen. sp.61 0.64 E H High

Insecta Gen. sp.62 0.39 I H Low

Insecta Gen. sp.63 36.00 N H Low

Insecta Gen. sp.64 10.11 N H Low

Insecta Gen. sp.65 1.13 E H High

Insecta Gen. sp.66 1.98 E P High

Insecta Gen. sp.67 3.71 Unknown H Low

Insecta Gen. sp.68 16.12 N H Low

Insecta Gen. sp.69 0.75 E H High

Insecta Gen. sp.70 2.45 ? P High

Insecta Geotomus punctulatus 4.61 N H High

Insecta Gryllus bimaculatus 32.69 I S High

Page 17: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

16

Insecta Gymnetron pascuorum 2.44 I H High

Insecta Habrocerus capillaricornis 3.89 N P High

Insecta Heliothrips haemorrhoidalis 1.54 I H High

Insecta Hemerobius azoricus 6.00 E P High

Insecta Hercinothrips bicinctus 1.27 I H High

Insecta Heterotoma planicornis 4.19 N P High

Insecta Hipparchia azorina occidentalis 19.12 E H High

Insecta Hipparchia miguelensis 15.29 E H High

Insecta Hoplandrothrips consobrinus 1.95 I H High

Insecta Hoplothrips corticis 2.76 N F High

Insecta Hoplothrips ulmi 2.28 I F High

Insecta Hydroporus guernei 4.25 E P High

Insecta Isoneurothrips australis 1.29 I H High

Insecta Kleidocerys ericae 5.28 N H High

Insecta Lachesilla greeni 0.93 I S High

Insecta Laemosthenes complanatus 13.00 I P Low

Insecta Limnephilus atlanticus 9.14 E P High

Insecta Lindorus lophanthae 2.84 I P High

Insecta Longiunguis luzulella 1.78 I H High

Insecta Loricula coleoptrata 2.03 N P High

Insecta Loricula elegantula 1.94 N P High

Insecta Megamelodes quadrimaculatus 3.00 N H High

Insecta Meligethes sp.2 2.29 I H High

Insecta Meligethes sp.3 2.23 I H High

Insecta Mesapamea storai 11.80 E H High

Page 18: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

17

Insecta Metophthalmus occidentalis 1.40 E S High

Insecta Microplax plagiata 2.45 N H High

Insecta Micrurapteryx bistrigella 2.53 E H High

Insecta Monalocoris filicis 2.22 N H High

Insecta Muellerianella sp.1 3.02 N H High

Insecta Muellerianella sp.2 2.00 N H High

Insecta Muellerianella sp.3 2.38 N H High

Insecta Mythimna unipuncta 21.01 N H High

Insecta Myzus cerasi 0.77 I H High

Insecta Nabis pseudoferus ibericus 9.00 N P High

Insecta Neomariania sp.1 4.74 I H High

Insecta Neomyzus circumflexus 1.12 I H High

Insecta Nesothrips propinquus 0.76 I H High

Insecta Nezara viridula 15.51 I H High

Insecta Nycterosea obstipata 19.00 N H Low

Insecta Nysius atlantidum 3.34 E H High

Insecta Ocypus aethiops 9.00 N P High

Insecta Ocypus olens 17.00 N P High

Insecta Ocys harpaloides 6.00 N P High

Insecta Oinophila cf.flava 3.19 I H High

Insecta Oligota parva 1.00 I P High

Insecta Opogona sacchari 6.23 I H High

Insecta Opogona sp.1 6.70 I H High

Insecta Orius laevigatus laevigatus 2.30 N P High

Insecta Orthochaetes insignis 2.95 N H High

Insecta Otiorhynchus rugosostriatus 11.62 I H Low

Insecta Paranchus albipes 10.00 I P High

Page 19: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

18

Insecta Peripsocus milleri 2.18 N S High

Insecta Peripsocus phaeopterus 2.23 N S High

Insecta Peripsocus subfasciatus 2.70 N S High

Insecta Philaenus spumarius 5.23 I H High

Insecta Phloeonomus n.sp.6 3.73 E P High

Insecta Phloeonomus sp. 4 2.33 Unknown P High

Insecta Phloeonomus sp.1 2.30 N P High

Insecta Phloeonomus sp.3 2.12 I P High

Insecta Phloeopora sp.1 3.40 N P High

Insecta Phloeosinus gillerforsi 3.24 E H High

Insecta Phloeostiba azorica 3.02 E P High

Insecta Phlogophora interrupta 40.00 E H Low

Insecta Pinalitus oromii 4.00 E H High

Insecta Placonotus sp.1 2.37 N P High

Insecta Plinthisus brevipennis 2.56 N H High

Insecta Plinthisus minutissimus 1.44 N H High

Insecta Polymerus cognatus 5.00 N H High

Insecta Proteinus atomarius 1.00 N P High

Insecta Pseudacaudella rubida 0.99 N H High

Insecta Pseudanchomenes aptinoides 12.00 E P Low

Insecta Pseudechinosoma nodosum 2.50 E H Low

Insecta Pseudophloeophagus tenax 4.00 N H High

Insecta Pseudophonus rufipes 13.00 I P/H High

Insecta Ptenidium pusillum 1.00 I S High

Insecta Pterostichus aterrimus aterrimus 9.00 N P High

Insecta Pterostichus vernalis 7.00 I P High

Page 20: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

19

Insecta Quedius curtipennis 11.00 N P High

Insecta Rhopalosiphonimus latysiphon 1.18 I H High

Insecta Rhopalosiphum oxyacanthae 1.70 I H High

Insecta Rhopalosiphum rufiabdominalis 0.88 I H High

Insecta Rhopobota naevana 5.57 I H High

Insecta Rugilus orbiculatus orbiculatus 4.00 N P High

Insecta Saldula palustris 3.17 N P High

Insecta Scolopostethus decoratus 4.71 N H High

Insecta Scopaeus portai 2.29 N P High

Insecta Scoparia coecimaculalis 5.74 E H High

Insecta Scoparia semiamplalis 5.48 E H High

Insecta Scoparia sp.3 2.08 E H High

Insecta Sepedophilus lusitanicus 6.00 N P High

Insecta Sericoderus lateralis 0.50 I P High

Insecta Sitona discoideus 5.79 I H High

Insecta Sitona sp.1 4.27 I H High

Insecta Sitophilus oryzae 3.00 I H High

Insecta Sphenophorus abbreviatus 10.45 I H Low

Insecta Stelidota geminata 2.17 I S High

Insecta Stenolophus teutonus 7.00 N P High

Insecta Stenus guttula guttula 4.00 N P High

Insecta Stilbus testaceus 1.94 N S High

Insecta Strophingia harteni 1.84 E H High

Insecta Tachyporus chrysomelinus 3.00 I P High

Insecta Tarphius azoricus 2.90 E F Low

Insecta Tarphius depressus 4.30 E F Low

Page 21: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

20

Insecta Tarphius n. sp.1 2.59 E F Low

Insecta Tarphius pomboi 3.60 E F Low

Insecta Tarphius rufonodulosus 4.00 E F Low

Insecta Tarphius serranoi 3.17 E F Low

Insecta Tarphius tornvalli 3.10 E F Low

Insecta Tarphius wollastoni 3.90 E F Low

Insecta Thrips atratus 3.07 N H High

Insecta Thrips flavus 1.03 N H High

Insecta Toxoptera aurantii 1.71 I H High

Insecta Trechus terrabravensis 3.35 E P Low

Insecta Trichopsocus clarus 1.83 N S High

Insecta Trigoniophthalmus borgesi 14.11 E S Low

Insecta Trioza laurisilvae 2.10 N H High

Insecta Typhaea stercorea 2.27 I F High

Insecta Uroleucon erigeronense 1.17 I H High

Insecta Valenzuela burmeisteri 0.87 N S High

Insecta Valenzuela flavidus 2.52 N S High

Insecta Xanthorhoe inaequata 13.45 E H Low

Insecta Xestia c-nigrum 15.36 N H High

Insecta Xyleborinus alni 3.14 I H High

Insecta Zetha vestita 5.00 N S High

Page 22: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

21

Full acknowledgements for the sourced datasets used in the model comparison.

Under the condition of use we reproduce the following text (in each case the text was obtained from personal communication with the data holder) from a number of the datasets used in our analyses which require such statement. For the full references for these datasets, and those which are not included here, see Table 1 in the main paper.

BCI data:

"The BCI forest dynamics research project was made possible by National Science Foundation grants to Stephen P. Hubbell: DEB-0640386, DEB-0425651, DEB-0346488, DEB-0129874, DEB-00753102, DEB-9909347, DEB-9615226, DEB-9615226, DEB-9405933, DEB-9221033, DEB-9100058, DEB-8906869, DEB-8605042, DEB-8206992, DEB-7922197, support from the Center for Tropical Forest Science, the Smithsonian Tropical Research Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Small World Institute Fund, and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part the Center for Tropical Forest Science, a global network of large-scale demographic tree plots."

Hinkley Point Fish data:

P. A. Henderson and Pisces Conservation Ltd kindly provided the data.

Pasoh Forest:

Pasoh 50ha data collection was funded by STRI/CTFS/FRIM and implemented by the Forest Research Institute Malaysia (FRIM) at the Pasoh FRIM Research Station.

Page 23: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

1

Appendix 3. Supplementary results

Correction added on 10 June 2014, after first online publication of the main part of the paper:

Following publication of the EarlyView version of the paper, two issues with our analyses

were brought to our attention: 1) we were inadvertently fitting a truncated form of the

Poisson lognormal distribution (PLN), and 2) there was a problem with how we derived the

predicted values from the PLN and logseries models. Correcting these issues does not change

our qualitative results; the only difference is that for a single fragment (fragment 12 in Table

2), the PLN now marginally outperforms gambin according to X2. As the Supplementary

material was not published with the EarlyView version of the paper, we have corrected the

following elements of this appendix following manuscript acceptance but prior to first

publication: Table A6, Table A8, and Fig. A1. The following elements of the main paper

were not corrected prior to first publication: Table 2, Fig. 1 and Fig. 2. The versions

appearing in the main paper are as originally published. We include the corrected versions of

these three elements at the end of this Appendix.

Page 24: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

2

Table A4. Simulated lognormal metacommunity results. Four different sets of lognormal metacommunities were simulated, containing a) 50 species, b) 100 species, c) 500 species, and d) 1000 species. Within each of these sets, different metacommunities were simulated, containing a varying number of individuals. There were 9 lognormal metacommunities in total. Five samples were drawn from each metacommunity relating to 1%, 10%, 25%, 50% and 100% of individuals in the metacommunity. For each sample the α parameter of the gambin distribution was calculated, and the maximum octave (maxoct) and modal octave (modal) of the observed abundance distribution (binned into octaves; see methods) were recorded. This process was repeated 100 times and average values were taken.

a) b) Fifty Species

Proportion of community sampled (%)

1000 Individuals 10,000 Individuals

α maxoct modal α maxoct modal 1 0.03 1 0 2.19 3 0

10 0.43 4 0 6.35 6 2 25 1.43 6 1 9.61 8 3 50 2.26 7 2 13.75 9 4

100 3.00 8 3 18.64 10 6

b)

One Hundred Species Proportion of community sampled (%)

1000 Individuals 10,000 Individuals

α maxoct modal α maxoct modal 1 1.06 1 0 1.14 3 0

10 1.45 3 0 4.15 6 2 25 2.15 4 1 5.97 8 4 50 3.15 5 2 8.84 9 5

100 5.21 6 2 12.69 10 6

c)

Five Hundred Species Proportion of community sampled (%)

10,000 Individuals 100,000 Individuals

α maxoct modal α maxoct modal

Page 25: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

3

1 0.5 2 0 1.15 5 0 10 1.56 4 0 4.88 8 3 25 2.92 5 1 7.03 10 4 50 4.49 6 2 9.75 11 6

100 6.97 7 3 12.79 12 7

d)

Proportion of

community sampled

(%)

One Thousand Species 10,000 Individuals 100,000 Individuals 1,000,000 Individuals

α maxoct modal α maxoct modal α maxoct modal 1 0.26 1 0 3.38 3 0 2.6 8 2

10 2.29 4 0 3.44 7 2 9.72 11 6 25 2.3 5 1 6.51 8 4 11.79 13 7 50 2.39 6 1 9.6 9 5 14.75 14 8

100 3.17 7 2 13.29 10 6 18.03 15 9

Page 26: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

4

Table A5. Simulated logseries metacommunity results. Three different sets of logseries metacommunities (i.e. abundances in the metacommunity were distributed according to a logseries distribution) were simulated, containing a) 50 species, b) 100 species, and c) 500 species. Within each of these sets two different metacommunities were simulated, containing one thousand and ten thousand individuals. Thus, there were six logseries metacommunities in total. Five samples were drawn from each metacommunity relating to 1%, 10%, 25%, 50% and 100% of individuals in the metacommunity. For each sample the α parameter of the gambin distribution was calculated and the maximum octave (maxoct) and modal octave (modal) of the empirical abundance distribution (binned into octaves; see methods) were recorded. This process was repeated 100 times and average values were taken.

a)

Proportion of community sampled (%)

Fifty Species 1000 Individuals 10,000 Individuals α maxoct modal α maxoct modal

1 0.67 2 0 1.07 5 0 10 0.48 5 0 1.00 8 0 25 0.39 7 0 0.86 9 0 50 0.46 8 0 0.47 10 0

Complete 0.48 9 0 0.48 11 0

b)

Proportion of community sampled (%)

One Hundred Species 1000 Individuals 10,000 Individuals α maxoct modal α maxoct modal

1 0.18 2 0 1.08 4 0 10 0.51 5 0 0.52 8 0 25 0.39 7 0 0.36 9 0 50 0.39 8 0 0.39 10 0

Complete 0.46 9 0 0.41 11 0

c)

Proportion of community sampled (%)

Five Hundred Species 1000 Individuals 10,000 Individuals α maxoct modal α maxoct modal

1 0.93 2 0 0.69 4 0 10 0.60 5 0 0.48 7 0 25 0.47 7 0 0.35 9 0 50 0.55 8 0 0.55 9 0

Complete 0.65 9 0 0.64 10 0

Page 27: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

5

Table A6. Model selection results for arthropod SADs of six Azorean islands: Faial (1), Flores (2), Pico (3), São Jorge (4), São Miguel (5), and Terceira (6). Arthropods were sampled in native Laurisilva forest using a standardised pitfall trap and canopy beating methodology between 1999 and 2004. For each island the Pearson’s χ² statistic and associated P value (in parentheses) are presented for the gambin, logseries, Poisson lognormal (PLN) and zero-sum multinomial (ZSM) distributions. The Bayesian Information Criterion (BIC) and Akaike’s Information Criterion corrected for small sample size (AICc) are also given. PLN has two parameters and the ZSM has three parameters. Gambin and the logseries are one parameter models. The best model according to each criterion is highlighted in bold for each island. Island information, including sample size, can be found in Table A2 in Appendix 2.

Gambin Logseries PLN ZSM

χ² (P) BIC AICc χ² (P) BIC AICc χ² (P) BIC AICc χ² (P) BIC AICc 1 17.0(0.07) 553.9 553.9 53.5(0.00) 601.4 601.5 22.7(0.01) 568.6 569.3 20.3(0.01) 586.3 589.4 2 22.6(0.10) 614.2 614.3 46.4(0.00) 648.9 648.9 28.1(0.00) 630.9 631.6 26.9(0.02) 636.8 639.0 3 34.7(0.00) 724.8 724.6 63.7(0.00) 767.2 767.0 39.4(0.00) 744.4 744.4 17.7(0.17) 750.1 751.6 4 18.4(0.10) 628.7 628.5 41.6(0.00) 659.1 658.9 28.7(0.00) 646.2 646.3 87.5(0.00) 661.5 663.8 5 18.0(0.08) 828.5 828.5 40.1(0.00) 868.6 868.5 25.9(0.01) 848.3 848.6 109.5(0.00) 855.6 857.8 6 32.1(0.01) 754.5 754.4 38.0(0.00) 776.9 776.8 34.8(0.00) 775.0 775.0 36.6(0.00) 767.9 769.4

Page 28: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

6

Table A7. The proportion of an assemblage/metacommunity required to be sampled in order for the model parameter estimate of the sample to be within specific accuracy thresholds of the parameter estimate of the whole assemblage. The three thresholds are 10% (i.e. the parameter estimate of the sample is within 10% of the assemblage parameter estimate), 25% and 50%. Four metacommunities were simulated, each with 10,000 individuals: 50 species, 100 species, 500 species and 1000 species. For each metacommunity 19 samples were taken, ranging from 5% of individuals in the metacommunity, to 95% of individuals, in intervals of 5%. The three models (gambin, Poisson lognormal, and logseries) were fitted to the samples and the relevant parameters recorded. This process was repeated 100 times and the mean of the parameter values calculated. The proportion of the metacommunity that needed to be sampled in order for the model parameters to be within the three accuracy thresholds was then calculated.

10% Accuracy 25% Accuracy 50% Accuracy 50 Species Gambin α 55% 50% 15% PLN mu 60% 30% 5% LS Alpha 45% 20% 10% 100 Species Gambin α 60% 50% 25% PLN mu 65% 35% 10% LS Alpha 60% 30% 15% 500 Species Gambin α 90% 75% 40% PLN mu 70% 50% 25% LS Alpha 60% 35% 15% 1000 Species Gambin α 65% 35% 5% PLN mu 65% 50% 35% LS Alpha 45% 15% 10%

Page 29: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

7

Table A8. Model comparison results for the sample grain size analysis for three fragments of native Laurisilva forest in the Azores (island in parentheses): a) fragment 4 (Flores), b) fragment 7 (Pico), c) fragment 14 (Terceira). For each fragment the model comparison was conducted using the mean abundance of each species across all transects in that fragment, and then for each combination of two transects and so on iteratively up to n transects. Comparison was between the gambin distribution, the logseries and the Poisson lognormal distribution (PLN). The best model was selected through a maximum likelihood based comparison comprising gambin, the Poisson lognormal and the logseries distributions. A model was selected as best if it had the lowest BIC and AICc values, with the minimum difference in each criterion set at two. Data are for arthropods sampled using a standardised pitfall trap and canopy beating method from 1999 to 2004. The best model for each set of transects is highlighted in bold.

a)

Transect Gambin Logseries PLN BIC AICc BIC AICc BIC AICc

1 219.3 219.7 235.9 236.3 225.5 227.1 2 268.3 268.5 302.9 303.1 273.5 274.6 3 306.7 306.9 346.9 347.1 313.5 314.6 4 313.9 314.0 370.0 370.0 318.8 319.5 5 379.0 379.0 406.9 407.0 389.8 390.5 6 384.7 384.8 428.0 428.1 393.9 394.6 7 392.1 392.0 438.9 438.8 401.5 401.8 8 397.0 396.9 445.8 445.7 406.9 407.3 b)

Transect Gambin Logseries PLN BIC AICc BIC AICc BIC AICc

1 312.0 312.4 345.4 345.8 320.6 322.2 2 370.7 370.9 423.8 424.0 379.5 380.6 3 406.1 406.3 463.3 463.5 415.5 416.6 4 408.7 408.7 496.3 496.3 416.1 416.8 5 503.5 503.6 545.3 545.3 516.7 517.4 6 514.8 514.9 569.7 569.7 526.8 527.5 7 522.3 522.4 577.8 577.8 533.9 534.6 8 528.2 528.1 584.1 584.0 541.9 542.3 c)

Transect Gambin Logseries PLN BIC AICc BIC AICc BIC AICc

1 256.2 256.8 283.9 284.5 263.0 265.2 2 308.1 308.5 354.2 354.5 315.1 316.7 3 354.5 354.7 404.6 404.8 361.9 363.0

Page 30: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

8

4 365.1 365.3 427.2 427.4 372.7 373.8 5 454.7 454.9 478.4 478.6 466.2 467.3 6 461.5 461.6 504.6 504.7 473.0 473.7 7 469.5 469.6 516.6 516.7 481.0 481.7 8 477.6 477.6 523.9 523.9 489.1 489.8

Page 31: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

9

Table A9. Simulation results for the pairwise comparison of different binning methods. The bottom left half of each table represents the mean correlation statistic from a Pearson’s product moment correlation for each pairwise comparison, and the top right half of each table represents the corresponding standard deviation of the pairwise correlation values. The five bases used to bin the data were loge (natural logarithm), log2, log3, log4 and log5. For each run a community was simulated with a) 10,000 individuals, and b) 100,000 individuals. For the first iteration of the run the number of species was set to 50, and at each subsequent iteration this was increased by 50, up to the maximum of 1000 species (i.e. 20 iterations). At each stage the gambin model was fit to the data using the five different binning methods, and the alpha value recorded. At the end of each run a Pearson’s product moment correlation was calculated for each pairwise comparison of alpha sets (i.e. different binning methods) and the values stored. The number of runs was set to 100 and the mean of the correlation values for each pairwise comparison was calculated.

a)

b)

Loge Log2 Log3 Log4 Log5 Loge 0.068 0.095 0.103 0.165 Log2 0.958 0.099 0.095 0.174 Log3 0.911 0.934 0.090 0.163 Log4 0.905 0.938 0.939 0.164 Log5 0.833 0.842 0.837 0.824

Loge Log2 Log3 Log4 Log5 Loge 0.029 0.045 0.056 0.112 Log2 0.909 0.035 0.019 0.109 Log3 0.865 0.927 0.049 0.111 Log4 0.878 0.926 0.803 0.115 Log5 0.880 0.805 0.765 0.807

Page 32: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

10

Supplementary figures

Figure A1. Observed (bars) and expected species according to the gambin distribution (black dots), logseries (red triangles) and Poisson lognormal (PLN; blue squares) for a selection of datasets from across the globe. a) Lepidoptera from the Rothamstead monitoring station, UK (α =1.6); b) marine nematodes from the English Channel (α =1.0); c) trees from the Sherman forest plot, Panama (α =2.5); d) trees (>10cm diameter) from the Barro Colorado Island 50ha plot, Panama (α =3.7); e) British breeding birds (α =8.2); f) marine nematodes from the Irish Sea (α =4.3). For reasons of clarity, the predicted values of the logseries have occasionally been omitted when they are much larger than the observed values. For dataset characteristics and references for these datasets, see Table 1 in the main paper.

Page 33: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

11

Figure A2. Difference in the logseries alpha parameter values (a-c) and the sigma parameter of the Poisson lognormal distribution (d-f) between transects of the three land use types (native forest, exotic plantation forest, and pasture), for three Azorean islands: Terceira (a and d), Faial (b and e), and Flores (c and f). The box plots display the median (red line), the first and third quartiles (black box), and the minimum and maximum values (whiskers). Differences in mean parameter values between land uses on each island were tested for significance using a Wilcoxon rank sum test. The only significant (i.e. P < 0.05) difference in alpha values between land uses, were between native forest and pasture on Terceira (a) and native and exotic forest on Flores (c). There were no significant differences in sigma values.

Page 34: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

12

Figure A3. The sensitivity of alpha to the binning method employed. Each plot represents a comparison of alpha values derived using a different log base binning method (X and Y axes). The different bases used were loge (natural logarithm), log2 (the main method used), log3, log4 and log5. For each base alpha was derived for 18 fragments of native Laurisilva forest in the Azores (see Table A1 in Appendix 2). Each pairwise comparison is significant according to a Pearson’s correlation test (P < 0.001).

Page 35: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

13

Correction added:

As per the correction added statement on page 1 of this appendix, the following elements are corrected versions of Table 2, Fig. 1 and Fig. 2 from the main paper.

Page 36: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

14

Table 2 [corrected]. Goodness of fit and model selection results for arthropod SADs of 18 (No.) native Laurisilva forest fragments in the Azores. Arthropods were sampled using a standardised pitfall trap and canopy beating methodology between 1999 and 2004. For each fragment the Pearson’s c² statistic and associated p value (in parentheses) are presented for the gambin, logseries, Poisson lognormal (PLN) and zero-sum multinomial (ZSM) distributions. The Bayesian information criterion (BIC) and Akaike’s information criterion corrected for small sample size (AICc) are also given for all four distributions. PLN has two parameters and the ZSM has three parameters. Gambin and the logseries are single parameter models. The best model according to each criterion, using a minimum difference of two, is highlighted in bold for each fragment. Fragment information can be found in Supplementary material Appendix 2, Table A1

Gambin Logseries PLN

ZSM

No. χ² (P) BIC AICc χ²  (p) BIC AICc χ² (P) BIC AICc χ² (P) BIC AICc 1 19.6 (0.03) 395.6 395.7 43.1 (0.00) 421.3 421.3 24.0 (0.01) 407.7 408.4 20 (0.04) 414.0 417.1 2 3.8 (0.92) 339.0 339.2 31.9 (0.00) 377.1 377.3 6.6 (0.68) 347.2 348.3 3.9 (0.98) 367.9 371.0 3 11.3 (0.34) 491.2 491.3 27.1 (0.00) 515.4 515.4 16.8 (0.08) 505.5 506.2 19.0 (0.06) 508.1 510.3 4 8.2 (0.51) 365.9 366.1 28.4 (0.00) 393.6 393.8 12.1 (0.21) 375.4 376.5 10.0 (0.63) 386.6 389.7 5 10.9 (0.36) 413.1 413.1 28.9 (0.00) 439.3 439.4 15.0 (0.13) 424.4 425.11 11.3 (0.42) 431.9 434.2 6 23.7 (0.01) 412.1 412.2 54.8 (0.00) 446.9 447.0 28.0 (0.00) 424.2 424.9 33.3 (0.11) 436.5 439.6 7 25.5 (0.00) 526.5 526.5 50.7 (0.00) 563.4 563.5 25.5(0.00) 539.1 539.8 10.5 (0.49) 550.4 552.6 8 19.0 (0.06) 458.4 458.3 36.6 (0.00) 480.4 480.3 26.9 (0.00) 474.1 474.5 44.2 (0.00) 473.6 476.7 9 9.2 (0.51) 418.3 418.3 27.8 (0.00) 445.8 445.8 13.2 (0.21) 427.1 427.8 31.7 (0.00) 442.1 445.2 10 19.5 (0.01) 531.8 532.2 30.9 (0.00) 559.2 559.5 18.7 (0.02) 544.2 545.8 7.8 (0.56) 549.9 554.1 11 7.5 (0.67) 331.2 331.2 19.8 (0.03) 346.8 346.9 11.9 (0.29) 340.1 340.8 70.8 (0.00) 343.7 346.7 12 23.6 (0.00) 510.7 510.9 37.3 (0.00) 538.7 538.9 23.2 (0.01) 523.7 524.8 27.4 (0.01) 529.0 532.1 13 25.2 (0.00) 582.9 582.9 47.1 (0.00) 615.0 615.0 32.8(0.00) 596.6 597.3 28.8 (0.00) 604.5 607.6 14 15.7 (0.07) 485.7 485.9 33.0 (0.00) 514.2 514.4 18.3 (0.03) 497.5 498.6 20.2 (0.31) 505.3 508.4 15 24.9 (0.00) 387.2 387.4 26.2 (0.00) 404.4 404.6 17.0 (0.05) 394.8 395.9 8.5 (0.58) 398.3 401.3 16 28.6 (0.00) 467.1 467.5 31.1 (0.00) 487.1 487.4 22.2 (0.00) 476.6 478.2 32.8 (0.04) 479.7 483.9 17 30.3 (0.00) 244.4 245.2 46.2 (0.00) 261.9 262.8 27.0 (0.00) 250.4 253.5 34.9 (0.04) 258.1 266.3 18 29.9 (0.00) 440.2 440.6 37.5 (0.00) 460.5 460.9 24.8 (0.00) 449.5 451.1 31.3 (0.14) 453.1 457.3

Page 37: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

15

Figure 1 [corrected]. Examples of the fit of the gambin distribution (black dots), the logseries (red triangles), and Poisson lognormal (blue squares) to observed data (bars). For reasons of clarity, the predicted values of the logseries have occasionally been omitted when they are much larger than the observed values. (a) Data are for arthropods from a fragment (No. 2, Supplementary material Appendix 2, Table A1) of native Laurisilva forest on the island of Faial, Azores. The a parameter of the gambin distribution is 1.9. (b) Data are for arthropods from a fragment (7, Supplementary material Appendix 2, Table A1) of native Laurisilva forest on the island of Pico, Azores. The α parameter of the gambin distribution is 1.8. In both plots, gambin provides the best fit according to both BIC and AICc.

Page 38: Ecography ECOG-00861 Matthews, T. J., Borregaard, M. K ... · 1 ( 1) 99( )/100 1 ( ) /0.99 jc x j jc Gxedx α α α α α −− − = ∫ Γ Note that this function defines the link

16

Figure 2 [corrected]. Changing shape of the observed species abundance distribution with sample size. Data were sampled from a simulated lognormal metacommunity (number of individuals (N) = 10 000, number of species = 100). The four plots correspond to different levels of sampling from the metacommunity: (a) 1% of individuals (N = 100) sampled from the metacommunity, (b) 10% of individuals (1000), (c) 50% of individuals (5000), and (d) 100% of individuals, i.e. the full metacommunity (N = 10 000). On each plot the a parameter of the gambin distribution (Gam), the alpha of the logseries distribution (LS), and the mean of the Poisson lognormal distribution (PLN) calculated using the sampled data are given. The fit of the gambin distribution (black dots), PLN (blue squares) and logseries (red triangles) to each sample is also presented. For reasons of clarity, the predicted values of the logseries have been omitted from bin 0 when they are much larger than the observed values.