5
Summary. The statistical distribution of lengths of time (for example, of the survival of bees) is often of interest. This paper describes graphical methods that are appropriate for such data, which typically have a skewed distribution. These methods throw light on the hypothesis of whether hazard rate is constant. Data published by Visscher and Dukas (1997) appear to show increasing hazard rate. Key words: Bee survival, hazard rate plotting, increasing hazard rate, survivorship, Weibull distribution. The length of a period of time until some event is often the subject of study. The duration of survival of bees, for example, has recently been of interest to Katayama (1996) and Visscher and Dukas (1997). The present note makes some suggestions for the analysis of such data. What is most often done is to present such data by tabulation and by drawing a histogram. Typically, these reveal that the distribu- tion of the data is skewed, with the mean being substantially greater than the median. But that is about all – the eye is unable to judge anything more from the shape of the histo- gram. This note will describe two other methods of plotting this type of data. The paper to be given greatest consideration here is that by Visscher and Dukas (1997). According to them, an ex- ponential distribution is a good fit to the total times, in their whole lives, that bees spend foraging (and this applies to the number of days on which they foraged, and to the number of foraging trips, as well as to the total time spent foraging). Visscher and Dukas had individually observed 33 bees. What they showed in their Figure 2 was that a histogram of the total foraging time of these bees was roughly exponential in shape. At this point, we need a little notation: F will be the pro- portion of observations with values less than or equal to a given number x. As well as a histogram, Visscher and Dukas also showed a graph (their Fig. 1A) of – ln (1 – F) versus x (where ln means natural logarithm). For the exponential dis- tribution, F = 1 – exp (– bx) (where b is a scale factor that in the case of the exponential distribution equals the reciprocal of the mean); consequently, – ln (1– F ) is proportional to x. The appropriateness or otherwise of the exponential distribu- tion may be judged by whether this can empirically be seen to be true. This format of plotting is thus an improvement on 1– F vs. x. It is a fairly popular format in entomology – examples include Biesmeijer and Tóth (1998), Garófalo (1978), Goldblatt and Fell (1987), Rodd et al. (1980), and Strassmann et al. (1997) (in this last example, the units that were surviving or dying were nests of wasps, not individual wasps). Nevertheless, this may not be quite the best type of graph to draw, and two further types of graph will be described below. Incidentally, note that the exponential dis- tribution predicts that – ln (1 – F ) is proportional to x, not merely that the relationship is linear. So the appearance of linearity is not sufficient to support the exponential distribu- tion; the plotted graph needs to be linear through the origin. Weibull plotting of survivorship A cursory inspection of the graphs presented by Visscher and Dukas could suggest that the exponential model may be appropriate. But it adds greatly to the credibility of a model if it can be embedded within a wider class of models, and the data be seen to follow the simple model as closely as it follows any. In the case of the exponential distribution, a com- mon choice for a wider class of models is the Weibull distri- bution (Antle and Bain, 1988). For the Weibull distribution, F = 1 – exp [– (bx) a ] (so the exponential distribution is equi- valent to a being 1). One reason for using the Weibull distri- bution is that it lends itself to graphical analysis, as follows. For the Weibull distribution, a sequence of two logarith- mic transformations results in the equation ln [– ln (1 – F )] = a ln b + a ln x. The importance of this is that if ln x is plot- ted horizontally and ln [– ln (1 – F )] is plotted vertically, the Weibull distribution will give a straight line, the slope of which is a; if the exponential distribution is followed, the slope will be 1. (Special graph paper can be obtained that does Insectes soc. 47 (2000) 292 – 296 0020-1812/00/030292-05 $ 1.50+0.20/0 © Birkhäuser Verlag, Basel, 2000 Insectes Sociaux Research article Graphing the survivorship of bees T.P. Hutchinson Department of Psychology, Macquarie University, Sydney, N.S.W. 2109, Australia, e-mail: [email protected] Received 1 September 1998; revised 1 September 1999 and 2 March 2000; accepted 9 March 2000.

Graphing the survivorship of bees

  • Upload
    tp

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Graphing the survivorship of bees

Summary. The statistical distribution of lengths of time (for example, of the survival of bees) is often of interest. Thispaper describes graphical methods that are appropriate forsuch data, which typically have a skewed distribution. Thesemethods throw light on the hypothesis of whether hazard rateis constant. Data published by Visscher and Dukas (1997)appear to show increasing hazard rate.

Key words: Bee survival, hazard rate plotting, increasinghazard rate, survivorship, Weibull distribution.

The length of a period of time until some event is often the subject of study. The duration of survival of bees, forexample, has recently been of interest to Katayama (1996)and Visscher and Dukas (1997). The present note makessome suggestions for the analysis of such data. What is mostoften done is to present such data by tabulation and by drawing a histogram. Typically, these reveal that the distribu-tion of the data is skewed, with the mean being substantiallygreater than the median. But that is about all – the eye isunable to judge anything more from the shape of the histo-gram. This note will describe two other methods of plottingthis type of data.

The paper to be given greatest consideration here is thatby Visscher and Dukas (1997). According to them, an ex-ponential distribution is a good fit to the total times, in theirwhole lives, that bees spend foraging (and this applies to thenumber of days on which they foraged, and to the number of foraging trips, as well as to the total time spent foraging).Visscher and Dukas had individually observed 33 bees. Whatthey showed in their Figure 2 was that a histogram of the totalforaging time of these bees was roughly exponential in shape.At this point, we need a little notation: F will be the pro-portion of observations with values less than or equal to agiven number x. As well as a histogram, Visscher and Dukasalso showed a graph (their Fig. 1A) of – ln (1 – F) versus x(where ln means natural logarithm). For the exponential dis-tribution, F = 1 – exp (– bx) (where b is a scale factor that in

the case of the exponential distribution equals the reciprocalof the mean); consequently, – ln (1– F ) is proportional to x.The appropriateness or otherwise of the exponential distribu-tion may be judged by whether this can empirically be seento be true. This format of plotting is thus an improvement on 1– F vs. x. It is a fairly popular format in entomology –examples include Biesmeijer and Tóth (1998), Garófalo(1978), Goldblatt and Fell (1987), Rodd et al. (1980), andStrassmann et al. (1997) (in this last example, the units thatwere surviving or dying were nests of wasps, not individualwasps). Nevertheless, this may not be quite the best type of graph to draw, and two further types of graph will bedescribed below. Incidentally, note that the exponential dis-tribution predicts that – ln (1– F) is proportional to x, notmerely that the relationship is linear. So the appearance oflinearity is not sufficient to support the exponential distribu-tion; the plotted graph needs to be linear through the origin.

Weibull plotting of survivorship

A cursory inspection of the graphs presented by Visscher andDukas could suggest that the exponential model may beappropriate. But it adds greatly to the credibility of a model ifit can be embedded within a wider class of models, and the data be seen to follow the simple model as closely as it follows any. In the case of the exponential distribution, a com-mon choice for a wider class of models is the Weibull distri-bution (Antle and Bain, 1988). For the Weibull distribution, F = 1 – exp [– (bx)a] (so the exponential distribution is equi-valent to a being 1). One reason for using the Weibull distri-bution is that it lends itself to graphical analysis, as follows.

For the Weibull distribution, a sequence of two logarith-mic transformations results in the equation ln [– ln (1– F)] =a ln b + a ln x. The importance of this is that if ln x is plot-ted horizontally and ln[– ln(1– F)] is plotted vertically, theWeibull distribution will give a straight line, the slope ofwhich is a; if the exponential distribution is followed, theslope will be 1. (Special graph paper can be obtained that does

Insectes soc. 47 (2000) 292–2960020-1812/00/030292-05 $ 1.50+0.20/0© Birkhäuser Verlag, Basel, 2000

Insectes Sociaux

Research article

Graphing the survivorship of bees

T.P. Hutchinson

Department of Psychology, Macquarie University, Sydney, N.S.W. 2109, Australia, e-mail: [email protected]

Received 1 September 1998; revised 1 September 1999 and 2 March 2000; accepted 9 March 2000.

Page 2: Graphing the survivorship of bees

Insectes soc. Vol. 47, 2000 Research article 293

away with the need to convert x to ln x and F to ln[– ln(1– F)]: the grid on such graph paper is not equally spaced,but is unequally spaced in just the right way.) Thus failure ofthe exponential distribution shows up as a slope that is dif-ferent from 1, something the eye can judge quite readily.

By way of example, Part (i) of Table 1 shows some data(read from Fig. 2 of Visscher and Dukas), Part (ii) shows theresults of the calculations, and Fig. 1 shows the graph. A lineof slope 1 – this is the slope that corresponds to an exponen-tial distribution – is included on the graph; it can be seen thatthe observations have a rather greater slope, approximately1.33. Thus we can conclude that a Weibull distribution withan a of approximately 1.33 is a rather better fit to the datathan an exponential distribution is. The SAS package pro-vides a convenient way of estimating the parameters of aWeibull distribution, using the maximum likelihood criterion.(With the data being called blife, the command was: proclifereg; model blife = /dist = weibull;) The result was anestimated a of 1.39. (The data used as input were the 33 exacttimes, kindly supplied by Kirk Visscher, not the grouped data of Table 1.) The fits of the Weibull and exponentialdistributions may be compared using the difference betweentheir log-likelihoods. The resulting chi-squared (1d.f.) sta-tistic is 4.9, which enables us to reject the hypothesis of anexponential distribution.

Because it is based on cumulated data, a graph like Fig. 1will always be quite smooth, and will emphasise broadfeatures of the data. The next section describes a graph basedon data for individual time periods, which therefore empha-sises details in the data.

Table 1. Basic data and calculations for Figures 1 and 2

Part (i) Part (ii) Part (iii)

Hours, x Deaths, n ln x ln[– ln (1– Fx )] f / (1– F)

0–10 7 2.30 – 1.43 0.023710–20 7 3.00 – 0.59 0.031120–30 6 3.40 – 0.07 0.037530–40 6 3.69 0.44 0.060040–50 2 3.91 0.64 0.033350–60 2 4.09 0.87 0.050060–70 1 4.25 1.03 0.040070–80 1 4.38 1.25 0.066780–90 1 – – 0.2000

• Part (i) shows the number, n, of bees that die in the respective 10-hourperiods of their foraging. (Total number of bees = 33).

• Part (ii) shows the results of the calculations required for the graphicalstudy of the Weibull hypothesis: ln x is the natural logarithm of theend of the time period, and Fx is the proportion of bees that are dead after x hours. For example, the first entries are ln 10 and ln[– ln (1 – (7/33))].

• Part (iii) shows the probability of death per unit time: f is the proportion of bees that die during a particular time period divided by the length of the time period, and 1– F is the proportion of bees thatare alive at this time period (average of the beginning and the end).For example, for the first entry in this column, f is (7/33)/10 and 1– F is 29.5/33, so f / (1– F) is 0.0237.

Figure 1. Weibull plot of survivorship. (The straight lines have notbeen fitted to the data, they are shown only to indicate what differentslopes look like. The shallower has a slope of 1.0 and the steeper has a slope of 1.33.)

Figure 2. Hazard rate versus time: the quantities from Part (iii) ofTable 1 are plotted against the mid-points of the corresponding timeperiods, i.e., 5 hours, 15 hours, and so on. (Note the change of scale forthe final data point.)

Page 3: Graphing the survivorship of bees

294 TP. Hutchinson Graphing the survivorship of bees

Plotting the hazard rate

Visscher and Dukas claimed their data supported the idea ofa constant probability of death. This conclusion follows fromthe distribution of foraging time being exponential. But Vis-scher and Dukas’ conclusion of exponentiality was basedupon graphs which, as argued earlier, only give a rough ideaof the shape of the distribution.

It would be better, as O’Donnell and Jeanne (1992) andStrassmann (1985) did, to directly plot

number of deaths per unit time during a time period000000005 ,

number alive during the time period

and examine whether this is constant. The words used in theabove ratio are intended to capture the general concept, andthe reader needs to be aware that interpretations may differ indetail. In Table 1 (and in Fig. 2), what we might term thepoint of reference is at the mid-point of a time period: thedenominator is the average of what it is at the beginning andthe end, and the hazard rate is plotted versus the mid-point ofthe time period. Instead of this, the denominator is some-times taken to be the number alive at the beginning of thetime period, and then the hazard rate is plotted versus thebeginning of the time period.

The ratio just defined is the current probability of death(per unit time) of a bee that is still alive. Analogous conceptsare used by reliability engineers concerned with the distri-bution of lengths of lifetime of equipment, and by actuariesconcerned with the distribution of human lifetimes. Nameslike “hazard rate” and “force of mortality” are used. Patel(1983) gives a definition of hazard rate, along with muchbasic information. An important property of the exponentialdistribution is that this hazard rate is constant. For the Wei-bull distribution, the hazard rate is proportional to x a –1, andso is smoothly increasing when a exceeds 1. Notice the contrast between the hazard rate, which is based upon thenumber of bees still alive at a specified time, and the quan-tity (number of deaths per unit time during a time period)/(number who entered the study at time zero). This latter ratio is not relevant when we want to know the risk to a beethat is known to be alive at a specified time, with bees that have already died being excluded from the population at risk.

Results of calculations on the data of Visscher and Dukasare shown in Part (iii) of Table 1, and are plotted in Fig. 2.

There are many concepts in the reliability literatureavailable to describe data on lifetimes. (See, for example,Patel, 1983.) One of these is the idea of increasing hazardrate (IHR), which means that when a graph like Fig. 2 isplotted, it slopes upwards throughout. It appears from Fig. 2that these data have the IHR property, not constant hazardrate. Note that a graph like Fig. 2 enables a lot of detail to be seen, and so one needs to beware of the danger of over-interpreting its features – that is, of trying to give a meaningto something that is merely random fluctuation. Fig. 2 is shown in order to demonstrate the technique of plotting such graphs, and the positions of the last few points on the

right of the graph are very imprecise owing to there beingvery few bees still alive.

Some technical details

This paper has aimed at a clear presentation of the basics ofgraphing survivorship. The present section examines somedetails.

Exact survival times. The calculations in Table 1 are ap-propriate for grouped (tabulated) data. If the exact times of survival of all the bees are known, the procedure is as follows. Let the number of bees be N. First, assign to the survival times ranks 1 to N (shortest to longest survival).Second, for an observation x that is ranked i, F is estimatedto be i/(N + 1). Finally, plot ln[– ln(1– [i/(N + 1)])] against ln x. (Some people prefer to estimate F by (i – 0.5)/N, or bysome other function of i and N: the phrase used for this is“plotting position”, and a useful reference is Crowder et al.(1991, Section 2.9).)

Smooth histograms. Starting from exact survival times,the construction of a histogram is arbitrary in two ways. Oneis the width of the classes, e.g., should they be 600 minutesas in Table 1, or 500 minutes, or 1000 minutes, or what? Theother is where these classes should be located, e.g., they areat 0–600, 600–1200, etc. in Table 1, but there is no reasonwhy they should not be at 100–700, 700–1300, etc., or at200–800, 800–1400, etc. The second of these two arbitrarychoices can be eliminated these days: a computer can easilycount up the number of observations between x – 300 and x + 300 for any given x. So we get it to do this for x = 1, 2, 3, …, and plot the result. This gives a fairly smooth curve thatis the equivalent of a histogram. (The first arbitrary element,the width of the classes, remains.) There are a number of varia-tions on this idea, and some of them give an even smoothercurve. The term used is “kernel estimator” of probabilitydensity, see Wegman (1983). At the time of writing, the web-site http://www.stat.sc.edu/rsrch/gasp/density/model1.htmlwill perform the necessary calculations and plot the curvevery easily.

Smooth hazard plots. I do not know of software that produces a smooth estimate of the hazard curve quite soeasily, but the following method could be used. Choose awindow width, 2w. Count the number of deaths between x – w and x + w, and call this n1 . Count the number of sur-vivors at time x, and call this n2 . Then n1/(2wn2) is an esti-mate of the proportion of surviving individuals that die per unit time at time x. Repeat this at as many different valuesof x as desired. For smooth hazard plots, see Klein andMoeschberger (1997, Section 6.2).

Discussion of some limitations

The chief purpose of this note has been to draw attention tosome simple techniques of data presentation that seem to benot well-known among entomologists. In addition, somecomment has been made about one of the substantive issues

Page 4: Graphing the survivorship of bees

Insectes soc. Vol. 47, 2000 Research article 295

in Visscher and Dukas. Fig. 1 suggests that if a Weibull dis-tribution is assumed, its a is appreciably greater than 1, andFig. 2 suggests that hazard rate is increasing, up to about 30 h of foraging (as mentioned earlier, the last few data points are imprecise). Before closing, some warnings aboutthe techniques should be given.

Alternatives to the Weibull distribution. One should not getover-excited on discovering that the Weibull distribution is agood description of some data: when this is the case, it isusually also true that other distributions of the same generalshape – such as the lognormal, and the gamma – are alsogood descriptions. For the lognormal distribution (but not forthe gamma), there exists a method of plotting the data that is similar to that described in this paper for the Weibull dis-tribution, and that gives a straight line when the lognormaldistribution is appropriate.

Subpopulations. It may be plausible that there exist subpop-ulations of bees – for example, from different gene pools, orexposed to different environmental conditions. In this case, itis perfectly possible for the Weibull distribution to be a gooddescription of the total population, but not of any of the subpopulations; or the reverse could be the case.

Explanatory variables. Fitting a distribution to a set of life-times has the inherent limitation that there are no explanatoryvariables (e.g., environmental conditions, or the nature of thebee’s activity). When these are available, an approach anal-ogous to that in other contexts would be to model the hazardto a specified bee multiplicatively – that is, as a baselinehazard at time x multiplied by a factor depending on thecharacteristics of the specified bee. For such methods, see the books that deal with methods of handling lifetime data(mostly in either engineering or medical contexts) – e.g.,Klein and Moeschberger (1997), Lee (1992), or Crowder et al. (1991).

Interpretation of the slope, a. It is tempting to reason asfollows: “Fig. 1 has a slope greater than 1; the Weibull distri-bution with a greater than 1 has an increasing hazard rate;therefore, these data exhibit increasing hazard rate”. Thisreasoning is appropriate for the range of the data covered bythe graph – which might be ln [– ln (1– F)] ranging from – 1 to 1, which means that F ranges from 0.31 to 0.93. How-ever, with hazard rate, there is a tendency to be especiallyinterested in its behaviour in the tail of the distribution (say,for F between 0.9 and 1, and a feature that is characteristic of most of the distribution is not necessarily characteristic ofits tail. To make a statement about hazard rate, one reallyneeds to look directly at the graph of this (as in Fig. 2).

Lack of data. A problem with hazard rate plots such asFig. 2 is that the last few points are based on very little data,and hence are imprecise. One possibility is to say that timesgreater than (for example) 50 hours are relevant to less than20% of the bees, that data are always likely to be sparse inthis region, and there is no point in attempting to draw con-clusions here. An alternative possibility, if one does not wish

to ignore these data completely, is to plot the hazard rateagainst F (instead of against x). This will bring the last fewdata points closer together in the horizontal direction – andthe eye may get a correct impression of random scatter, ratherthan a probably wrong impression of a complicated sequenceof changes in hazard rate.

Limitations of statistical testing. Many methods havebeen developed by statisticians and others to squeeze themaximum information out of data. It is tempting to use suchmethods to the uttermost when the basic data require me-ticulous observation of individual insects, as that of Visscherand Dukas did. However, I would make a distinction betweenstatistical methods for exploring data and those for inference(testing a null hypothesis). The statistical study of bee life-times being at a fairly early stage, I suggest the chief thing weshould do with data is to probe for ideas, not test the ideas.The reason is as follows. Suppose a dataset suggests some-thing a little complicated is true. That naturally prompts us toselect a statistical test that is sensitive to the feature we havenoticed in the data. But this is unfair – very likely, we willreject the null hypothesis in favour of Feature A if Feature Ais what suggested the statistical test! (If, instead, Feature Bhad been present, then we would not have done a test forFeature A, but would have done one for Feature B instead). If we had some reason to select a statistical test sensitive toFeature A before we saw our data (perhaps we had noticedthis in other data, or perhaps it was suggested by sometheory), then it would be fair to conduct the test. For exam-ple, having seen in Fig. 2 a suggestion of an increasinghazard rate, it would be appropriate to submit a differentdataset to a statistical test that is sensitive to this – perhapsone of the tests described by Patel (1983).

Acknowledgements

I thank Kirk Visscher for letting me know the 33 exact observations of minutes spent foraging, Dom Adair for stimulating discussions, mycolleague Alan Taylor for help with SAS, and the referees for severalhelpful comments.

References

Antle, C.E. and L.J. Bain, 1988. Weibull distribution. In: Encyclopediaof Statistical Sciences, Volume 9, Wiley, New York, pp. 549–556.

Biesmeijer, J.C. and E. Tóth, 1998. Individual foraging, activity leveland longevity in the stingless bee Melipona beecheii in Costa Rica (Hymenoptera, Apidae, Meliponinae). Insectes soc. 45: 427–443.

Crowder, M.J., A.C. Kimber, R.L. Smith and T.J. Sweeting, 1991. Statistical Analysis of Reliability Data. Chapman and Hall, London.

Garófalo, C.A., 1978. Bionomics of Bombus (Fervidobombus) morio. 2.Body size and length of life of workers. J. Apic. Res. 17: 130–136.

Goldblatt, J.W. and R.D. Fell, 1987. Adult longevity of workers of thebumble bees Bombus fervidus (F.) and Bombus pennsylvanicus (DeGeer) (Hymenoptera: Apidae). Can. J. Zool. 65: 2349–2353.

Katayama, E., 1996. Survivorship curves and longevity for workers ofBombus ardens SMITH and Bombus diversus SMITH (Hymenoptera,Apidae). Jap. J. Entomol. 64: 111–121.

Page 5: Graphing the survivorship of bees

296 TP. Hutchinson Graphing the survivorship of bees

Klein, J.P. and M.L. Moeschberger, 1997. Survival Analysis. Techniquesfor Censored and Truncated Data. Springer, New York.

Lee, E.T., 1992. Statistical Methods for Survival Data Analysis. Wiley,New York.

O’Donnell, S. and R.L. Jeanne, 1992. Lifelong patterns of foragerbehaviour in a tropical swarm-founding wasp: Effects of special-ization and activity level on longevity. Anim. Behav. 44: 1021–1027.

Patel, J.K., 1983. Hazard rate and other classifications of distributions.In: Encyclopedia of Statistical Sciences, Volume 3, Wiley, New York,pp. 590–594.

Rodd, F.H., R.C. Plowright and R.E. Owen, 1980. Mortality rates ofadult bumble bee workers (Hymenoptera: Apidae). Can. J. Zool. 58:1718–1721.

Strassmann, J.E., 1985. Worker mortality and the evolution of castes inthe social wasp Polistes exclamans. Insectes soc. 32: 275–285.

Strassmann, J.E., C.R. Solis, C.R. Hughes, K.F. Goodnight and D.C.Queller, 1997. Colony life history and demography of a swarm-founding social wasp. Behav. Ecol. Sociobiol. 40: 71–77.

Visscher, P.K. and R. Dukas, 1997. Survivorship of foraging honeybees. Insectes soc. 44: 1–5.

Wegman, E.J., 1983. Kernel estimators. In: Encyclopedia of StatisticalSciences, Volume 4, Wiley, New York, pp. 369–370.