8
CLINICAL MICROBIOLOGY REVIEWS, JUly 1990, p. 219-226 Vol. 3, No. 3 0893-8512/90/030219-08$02.00/0 Copyright © 1990, American Society for Microbiology Statistical Methods in Microbiology DUANE M. ILSTRUP Section of Biostatistics, Mayo Medical School, Mayo Clinic and Mayo Foundation, Rochester, Minnesota 55905 INTRODUCTION .................................................................. 219 THE NATURE OF VARIABLES .................................................................. 219 DESCRIPTIVE ANALYSES .................................................................. 219 INDEPENDENT VERSUS DEPENDENT OBSERVATIONS ..........................................................220 HISTORICAL VERSUS PROSPECTIVE STUDIES .................................................................. 220 ESTIMATION VERSUS TESTING, ERROR RATES, AND STATISTICAL POWER.........................220 STATISTICAL HYPOTHESIS TESTS .................................................................. 221 Two Samples with Dependent or Paired Observations ................................................................221 Nominal data ................................................................... 221 Continuous Gaussian data .................................................................. 222 Continuous non-Gaussian data .................................................................. 222 Ordinal data .................................................................. 222 Two Samples with Independent Observations .................................................................. 223 Nominal data .................................................................. 223 Continuous Gaussian data .................................................................. 223 Continuous non-Gaussian data .................................................................. 223 Ordinal data ................................................................... 223 Three or More Samples with Dependent Observations ...............................................................224 Three or More Samples with Independent Observations .............................................................224 OTHER STATISTICAL METHODS .................................................................. 224 EVALUATING NEW DIAGNOSTIC TESTS .................................................................. 224 True Patient Status Is Known .................................................................. 224 Negative or positive diagnostic test ................................................................... 224 Ordinal diagnostic test .................................................................. 225 Unknown True Patient Status .................................................................. 225 SUMMARY .................................................................. 226 LITERATURE CITED .................................................................. 226 INTRODUCTION With as few formulas and with as little theory as possible, statistical methods are described that are usually appropriate in specific experimental situations. These methods are not described in detail, but references are given that will allow the interested reader to study them more thoroughly. Before specific methods can be described, however, the reader must be able to recognize the nature of the variables being studied and whether the study observations are dependent or inde- pendent. THE NATURE OF VARIABLES Study variables may be classified into three general types: nominal, ordinal, and continuous. Nominal variables are those that take on only a finite (usually small) number of categories, when the categories have no logical ordering. Examples of nominal variables are death (no or yes) of an experimental animal in an antibiotic study and growth or no growth of an organism in a culture medium investigation. Ordinal variables are those that also take on a finite number of categories, but the categories have a logical ordering to them. An example of an ordinal variable is a level of intensity, growth, or cytopathic effect that is negative, +1, +2, etc. Continuous variables are measured variables that usually are limited in number only by the precision of the instrument measuring the variable. A commonly assumed distribution of a continuous variable is the Gaussian or normal distribution with its familiar bell-shaped form. Unfortunately, very few variables in medicine and in microbiology have mathemati- cally precise Gaussian underlying distributions (14). Radioactivity counts, counts of CFU or fluorescence- forming units, and time in hours or days to growth of an organism in culture all tend to have skewed distributions (distributions in which most results are concentrated on one end of the distribution with a long tail of values extending in the opposite direction). When continuous variables are ap- proximately Gaussian, parametric statistical methods may be used, but when the distributions are markedly non- Gaussian and when they cannot be transformed to be Gaus- sian after a mathematical transformation such as the loga- rithm, nonparametric methods should be used. Both parametric and nonparametric methods are discussed later, but nonparametric methods are emphasized because of the non-Gaussian nature of many microbiological measure- ments. DESCRIPTIVE ANALYSES It is beyond the scope of this article to give any more than a minimal introduction to descriptive statistical methods. Descriptive analyses may be grouped into three general categories: (i) measures of central tendency, (ii) measures of variability, and (iii) graphic displays. Measures of central tendency are statistics that describe 219 on November 7, 2020 by guest http://cmr.asm.org/ Downloaded from

Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

CLINICAL MICROBIOLOGY REVIEWS, JUly 1990, p. 219-226 Vol. 3, No. 30893-8512/90/030219-08$02.00/0Copyright © 1990, American Society for Microbiology

Statistical Methods in MicrobiologyDUANE M. ILSTRUP

Section of Biostatistics, Mayo Medical School, Mayo Clinic and Mayo Foundation, Rochester, Minnesota 55905

INTRODUCTION .................................................................. 219THE NATURE OF VARIABLES .................................................................. 219DESCRIPTIVE ANALYSES.................................................................. 219INDEPENDENT VERSUS DEPENDENT OBSERVATIONS ..........................................................220HISTORICAL VERSUS PROSPECTIVE STUDIES .................................................................. 220ESTIMATION VERSUS TESTING, ERROR RATES, AND STATISTICAL POWER.........................220STATISTICAL HYPOTHESIS TESTS .................................................................. 221Two Samples with Dependent or Paired Observations ................................................................221Nominal data ................................................................... 221Continuous Gaussian data.................................................................. 222Continuous non-Gaussian data .................................................................. 222Ordinal data .................................................................. 222

Two Samples with Independent Observations .................................................................. 223Nominal data .................................................................. 223Continuous Gaussian data.................................................................. 223Continuous non-Gaussian data .................................................................. 223Ordinal data ................................................................... 223

Three or More Samples with Dependent Observations ...............................................................224Three or More Samples with Independent Observations .............................................................224

OTHER STATISTICAL METHODS .................................................................. 224EVALUATING NEW DIAGNOSTIC TESTS .................................................................. 224True Patient Status Is Known .................................................................. 224

Negative or positive diagnostic test................................................................... 224Ordinal diagnostic test .................................................................. 225

Unknown True Patient Status .................................................................. 225SUMMARY .................................................................. 226LITERATURE CITED.................................................................. 226

INTRODUCTION

With as few formulas and with as little theory as possible,statistical methods are described that are usually appropriatein specific experimental situations. These methods are notdescribed in detail, but references are given that will allowthe interested reader to study them more thoroughly. Beforespecific methods can be described, however, the reader mustbe able to recognize the nature of the variables being studiedand whether the study observations are dependent or inde-pendent.

THE NATURE OF VARIABLES

Study variables may be classified into three general types:nominal, ordinal, and continuous.Nominal variables are those that take on only a finite

(usually small) number of categories, when the categorieshave no logical ordering. Examples of nominal variables aredeath (no or yes) of an experimental animal in an antibioticstudy and growth or no growth of an organism in a culturemedium investigation.

Ordinal variables are those that also take on a finitenumber of categories, but the categories have a logicalordering to them. An example of an ordinal variable is a levelof intensity, growth, or cytopathic effect that is negative,+1, +2, etc.Continuous variables are measured variables that usually

are limited in number only by the precision of the instrument

measuring the variable. A commonly assumed distribution ofa continuous variable is the Gaussian or normal distributionwith its familiar bell-shaped form. Unfortunately, very fewvariables in medicine and in microbiology have mathemati-cally precise Gaussian underlying distributions (14).

Radioactivity counts, counts of CFU or fluorescence-forming units, and time in hours or days to growth of anorganism in culture all tend to have skewed distributions(distributions in which most results are concentrated on oneend of the distribution with a long tail of values extending inthe opposite direction). When continuous variables are ap-proximately Gaussian, parametric statistical methods maybe used, but when the distributions are markedly non-Gaussian and when they cannot be transformed to be Gaus-sian after a mathematical transformation such as the loga-rithm, nonparametric methods should be used. Bothparametric and nonparametric methods are discussed later,but nonparametric methods are emphasized because of thenon-Gaussian nature of many microbiological measure-ments.

DESCRIPTIVE ANALYSES

It is beyond the scope of this article to give any more thana minimal introduction to descriptive statistical methods.Descriptive analyses may be grouped into three generalcategories: (i) measures of central tendency, (ii) measures ofvariability, and (iii) graphic displays.Measures of central tendency are statistics that describe

219

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 2: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

CLIN. MICROBIOL. REV.

the center of a sample of data. These statistics are supposedto estimate what a "typical" data value should be. The mean(or average value) is calculated by adding all of the datapoints in the sample and then dividing this sum by thenumber of data points. The mean can be thought of as thecenter of gravity of the sample or the balance point of thedistribution. The sample mean has many highly desirablemathematical properties that make it very useful for com-paring one sample with another, but it has the undesirableproperty that it may be highly influenced by outliers in thedata or by data that have an asymmetric distribution. One ortwo very high or very low values will strongly pull the meanaway from the center of the distribution.The median is the halfway point of the sample, the point at

which half of the data are below the point and half of the dataare above the point. The median is not strongly influenced byasymmetry or outliers in the data, and in many cases it is amuch better estimate of a typical data value than is the mean.There are several statistics that attempt to describe the

variability of the sample data. They all try to measure howconcentrated or, conversely, how dispersed the data are.The simplest of these statistics are the minimum, the maxi-mum, and the range (maximum minus the minimum). Thesestatistics have the virtue that they are simple to understand,but their values are a function of the sample size. Forcontinuous data, as the sample size increases, the minimumdecreases, and both the maximum and the range increase.Another way to describe the variability of the data is to

estimate percentiles of the distribution such as the 25thpercentile and the 75th percentile. These have the propertythat 25% of the data fall below the 25th percentile and 25%fall above the 75th percentile. Often the interquartile range(the 75th percentile minus the 25th percentile) is quoted.The most common and most abused statistic that de-

scribes variability is the sample standard deviation. Thestandard deviation is the square root of the sample variance,and the sample variance is the sum of the squared differencesbetween the sample data points and the sample mean, alldivided by the sample size minus 1. Many investigatorsbelieve that 95% of the sample data lie within 2 standarddeviations of the mean. This is true when the underlyingdistribution is Gaussian, but as mentioned before this israrely the case in most biomedical settings.

Finally, when attempting to describe data, there is nobetter way than to use the "interocular test," that is, todisplay the data visually in a graphical form. Yogi Berra oncesaid, "You can see a lot just by lookin," and this isparticularly true with experimental data. O'Brien andShampo (15, 16) give excellent examples of how to displaydata with histograms, frequency polygons, cumulative dis-tribution polygons, and scatter diagrams.

INDEPENDENT VERSUS DEPENDENT OBSERVATIONSIn addition to the nature of the variable being studied, the

appropriate choice of statistical methodology is a function ofwhether the comparisons are made between independent ordependent experimental units. If two or more samples ofexperimental units are to be compared and the experimentalunits in one sample are not used again in the other sample,the observations in each experimental sample are indepen-dent of one another unless the units in one sample have beenmatched on a one-to-one basis with the units in the othersample. An example of two independent samples is a studyof the effectiveness of gentamicin versus ciprofloxacin in thetreatment of infected mice in which the mice have beenrandomly assigned to the two treatment groups.

When the same experimental unit is tested repeatedlyunder two or more experimental conditions, the design is adependent design. The results in one sample are correlatedwith the results from another sample because the sameexperimental units are tested in both groups. Another type ofdependent design is one in which the experimental units inone sample have been chosen to match on a one-to-one basisthe experimental units in another sample. Typically, thismatching is done on factors that are related to the responsevariable. These designs are known as paired or matchedstudies. When more than two measurements are made on thesame experimental unit, the study is called a repeated-measures design. An example of this would be the samepatients evaluated before treatment with an antibiotic, 1month after treatment, and 2 months after treatment.

HISTORICAL VERSUS PROSPECTIVE STUDIES

Comparisons of various treatments on, for example, theireffectiveness in eradicating bacterial infections are made intwo general ways: (i) by analyzing historical data not col-lected with a rigid protocol, and (ii) by planning a prospec-tive study. The former type of study is known as a historicalor retrospective study. Retrospective studies suffer frommany weaknesses, the chief one being selection bias. If, forexample, there are only two antibiotics to choose from, theattending physicians would choose the antibiotic which theythink is best for the patient based on the characteristics ofthe patient. Typically, then, the patients who receive oneantibiotic are qualitatively different from the patients whoreceive the other antibiotic. Therefore, one does not knowwhether an observed difference or lack of difference inoutcome is due to the effectiveness of the antibiotics or tothe differences in the nature of the two patient groups. Somehistorical studies, such as case control studies, can be welldesigned and be conducted by using a strict written protocol,but biases can still exist and interpretation of the results ofsuch studies may be difficult.The problems of the historical study can be alleviated

when the study is designed in advance and carried outprospectively. With appropriate treatment randomizationand, ideally, both the patient and the evaluating clinicianunaware of which treatment has been received, unbiasedestimates of treatment response may be made (5).

ESTIMATION VERSUS TESTING, ERROR RATES, ANDSTATISTICAL POWER

It is important for an investigator to understand thedifference between estimation and hypothesis testing. Inalmost every experiment, one of the goals is to determine thevalue of a true underlying population characteristic, forexample, the median time to culture positivity of specimensfrom patients infected with a given organism. This is aprocess called estimation (17). In addition to the estimate ofthe true population parameter, it is common practice to alsogive 95% confidence intervals for the parameter (4). Theseintervals have the property that, if the experiment wererepeated many times, 95% of the calculated confidenceintervals would include the true unknown value of thepopulation parameter.The 95% confidence limits usually are calculated in the

formestimate ± 1.96 (standard error of the estimate)

where 1.96 is the 97.5th percentile of the Gaussian distribu-tion with mean = 0 and variance = 1.

220 ILSTRUP

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 3: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

STATISTICAL METHODS IN MICROBIOLOGY 221

For the case in which one is interested in estimating theproportion, P, the number of times in which an event willoccur out of n independent trials, one computes the confi-dence interval in the following way:

IP( - P)P ± 1.96 7 1n

where the square root term is the standard error of P.For example, if a new screening test for the detection of

Chlamydia trachomatis correctly classifies 80 of 100 infectedpatients, the estimate of sensitivity is P = 80/100 = 0.8 andthe 95% confidence interval for the true unknown sensitivityis

/0.8 (1 - 0.8)0.8 ± 1.96 xI

100

= 0.8 + 0.08 or (0.72 to 0.88)

For the case in which, from a sample of n observations ofa continuous variable, one estimates the mean, x, and thestandard deviation, S, of the sample, the 95% confidenceinterval for the true mean is

t (n -1)- 0.975 V

n

where the quantity x/<- is called the standard error ofthe mean and ' In ') is the 97.5th percentile of the Studentt distribution with n - 1 df. For n of >40, this percentile ofthe t distribution may be approximated by 2.For example, if the mean time of detection of the early

antigen of cytomegalovirus in 60 patients is 16 h and thestandard deviation is 8 h, then a 95% confidence interval forthe true mean is

/8216 ± 2 60

= 16 ± 2 or (14 to 18)

If the goals of an experiment include comparison of theparameter estimate to a hypothetical value or comparisonsof estimates obtained under two or more experimentalconditions, one may wish to perform statistical tests ofhypotheses. Hypotheses tests, or significance tests, areusually formulated in terms of null and alternative hypothe-ses. The null hypothesis typically states that the populationparameter is equal to some hypothesized value or, in thecase of two samples, that the two population values areequal, for example, the median times to positivity in twodifferent culture media. One rejects the null hypothesis whenthe evidence from the sample(s) suggests that the observedresults in repeated experiments would have been very un-likely if the null hypothesis were true. Very unlikely isconventionally accepted to be <5%. The probabilities asso-ciated with hypothesis testing may be explained by examin-ing Table 1, in which a is the "type 1 error rate," which isthe probability of rejecting a true null hypothesis. As statedbefore, a is usually set in advance, typically at 0.05, and canbe controlled by the significance limits of the statisticalmethods used. 1P is the "type 2 error rate," which is theprobability of failing to reject a false null hypothesis. 1 - P3is called the "power" of the test. The power is the proba-bility that the test will be significant for a given departure

TABLE 1. Hypothesis testing

Decision based on outcome Null hypothesis (Ho)of statistical test Actually true Actually false

Do not reject Ho 1 -a P

Reject Ho a 1 -1

from the null hypothesis. The power of the specific test canbe improved by increasing the sample size, but it alsodepends on a, the actual value of the difference in parame-ters, and the nature of the statistical test. When an experi-ment is planned, it is imperative to have sufficient statisticalpower to detect any "clinically important" differences. Ifthe decision of a significance test is not to reject the nullhypothesis, but before being started, the study had almost nochance of rejecting the null hypothesis for any meaningfulclinical difference, the study has been uninformative. Inwell-planned studies, the sample size is chosen such that thepower is at least 0.80, and ideally 0.90, of detecting thesmallest difference that would be clinically meaningful.Planning sample size is beyond the scope of this paper, butthe interested reader should refer to references 1, 2, 3, 5, and7.

STATISTICAL HYPOTHESIS TESTS

The following subsections describe statistical methodolo-gies that are usually appropriate for the given experimentalsituations. The reader should be aware that these are onlygeneral guidelines and that in many cases more sophisticatedmethods should be used. When there is any doubt about theappropriate method of analysis in an actual experiment, or ofappropriate sample selection or appropriate methods tohandle missing data, the aid of a professional statisticianshould be enlisted. Examples will be given for some (but notall) methods. Table 2 is an outline of univariate methodswhich are applicable under the described combinations ofvariable type and study design.

Two Samples with Dependent or Paired Observations

Nominal data. A common experiment in clinical microbi-ology is one in which two culture media are compared interms of their ability to grow a specific type of microorgan-ism. In these experiments it is common practice to inoculatehalf of each patient's sample into each culture medium andto observe whether growth subsequently occurs in eachmedium. A hypothetical example of such an experiment isshown in Table 3.

It is necessary to recognize that this is not the usual kindof two-way frequency table in which the results from twoindependent groups are compared. In Table 3, 100 samplesare evaluated in two different media. The object is tocompare the proportion positive with medium 2 (30 + 35 =65)/100 with that positive with medium 1 (30 + 15 = 45)/100.Note that, for the proportions to be equal, the numeratorsmust be equal and that the 30 samples that were positive onboth media appear in both numerators. Therefore, the com-parison of the proportions is reduced to comparing thenumber 15 to the number 35 (called discordant pairs). Thequestion is whether the numbers 15 and 35 could each havereasonably resulted by sampling from a binomial experimentof n = 15 + 35 = 50 trials with a true probability equal toone-half. This is a situation in which the sign test (20) is

VOL. 3, 1990

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 4: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

CLIN. MICROBIOL. REV.

TABLE 2. Significance tests

Nature of variable (reference)Type of No. of

observations samples Nominal Continuous Gaussian Ordinal or continuousnon-Gaussian

I. Dependent 2 Sign test or McNemar's test (20) Paired t test (18) Wilcoxon signed-ranks test (20)

II. Independent 2 Relative deviate test (19) Chi- Two-sample t test (4) Wilcoxon rank sum test (4, 13)square (20)

III. Dependent 3 or more Cochran Q test (20) Repeated-measures analysis of Friedman's procedure (20)variance (12, 21)

IV. Independent 3 or more Chi-square test (20) One-way analysis of variance (4) Kruskal-Wallis one-way analy-sis of variance (20)

appropriate. A table of critical frequencies for the sign test(4) indicates that, at the a = 0.01 significance level for atwo-sided test with a sample size of 50, the critical frequencyis 15. The smaller of the two numbers is 15 and is equal tothis value. Therefore, it should be concluded that the pro-portion of positive cultures is greater for medium 2 than formedium 1 (P = 0.01). An alternative to the sign test knownas McNemar's test (20) is calculated by squaring the differ-ence in the two discordant pairs and dividing this quantity bythe sum of the discordant pairs. This ratio approximates a

chi-square distribution with 1 df when the hypothesis of nodifference in positivity rates is true. In this example we have

2 (35 - 15)2X1 35 + 15

400- = 8.0

50

A table of the chi-square distributions (1) with 1 df showsthat this statistic is greater than the 99.5th percentile of thedistribution and, once again, we conclude that the propor-tion of positivity is greater in medium 2 than in medium 1.Some statisticians suggest that a correction for continuity beused for McNemar's test. This is done by subtracting 1 fromthe absolute value of the difference in the numerator beforesquaring.Continuous Gaussian data. When hearing in patients is

measured in decibels before and after treatment with an

antibiotic, the paired t test (18) would be appropriate if thedistribution of change in decibels is well behaved with noappreciable skewness and no outliers. This is a common testand no example will be given. One concern with this designis that hearing could decrease over time in the absence oftreatment. The use of a radomized placebo comparisongroup would alleviate this problem.

Continuous non-Gaussian data. If, in the previous exam-

ple, the distribution of change in decibels was appreciablynon-Gaussian, a more appropriate test than the paired t testwould be Wilcoxon's signed-rank test (20). The referencegives a good example of this application.

Ordinal data. In clinical microbiology, variables are often

TABLE 3. Example of a dependent nominal table

Medium 1Medium 2 Total

Positive Negative

Positive 30 35 65Negative 15 20 35

Total 45 55 100

portrayed in ordinal terms such as titers or in logarithms.Table 4 is a hypothetical example of the bacterial count(log1o) in urine of patients before and after administration ofan antimicrobial drug. For example, 8 of 100 patients studiedhad log counts of 4 (10,000 bacteria) before treatment butwere negative after treatment.Note that the (3 + 4 + 4 + 4 + 3 = 18) results down the

diagonal from upper left to lower right are the same bothbefore and after treatment and do not contribute to thedifference in the counts. The frequencies that are one valueabove the diagonal had counts that were one step higherbefore than after the drug (5 + 5 + 5 + 5 = 20). These arecounterbalanced by the frequencies that were one step lowerbefore than after the drug (2 + 3 + 3 + 2 = 10). If we rankedthe differences from before to after treatment, all 30 of thesepatients would have tied ranks equal to 15.5, which is theaverage of the ranks 1 to 30. In the same manner, the (6 + 6+ 6 = 18) values two steps above the diagonal are counter-balanced by the (2 + 1 + 2 = 5) values two steps below thediagonal, and each would receive a tied rank of 42, theaverage of the ranks 31 to 53. Likewise, the (8 + 8 = 16)values three steps above the diagonal are counterbalancedby the (2 + 2 = 4) values three steps below the diagonal, andeach would receive a tied rank of 63.5, the average of theranks 54 to 73. Finally, the eight values four steps above thediagonal are counterbalanced by the one value four stepsbelow the diagonal, and each would receive a tied rank of 78,the average of the ranks 74 to 82. The ranks for the valuesbelow the diagonal can now be summed [(10 x 15.5) + (5 X42) + (4 x 63.5) + (1 x 78)] = 697. In the same fashion, theranks for the values above the diagonal can be summed [(20x 15.5) + (18 x 42) + (16 x 63.5) + (8 x 78)] = 2,706. Notethat the sum of the ranks below and above the diagonal (697+ 2,706 = 3,403) equals the sum of the integers from 1 to 82[(82 x 83)/2], where 82 is the number of values that are not

TABLE 4. Example of a dependent ordinal table

Before treatmentAfterToa

treatment 0- 1 2 3 4 Totalnegative

0= negative 3 5 6 8 8 301= +1 2 4 5 6 8 252= +2 2 3 4 5 6 203= +3 2 1 3 4 5 154= +4 1 2 2 2 3 10

Total 10 15 20 25 30 100

222 ILSTRUP

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 5: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

STATISTICAL METHODS IN MICROBIOLOGY 223

TABLE 5. Example of an independent nominal table

Growth Noncentrifuged Centrifuged Total

Yes 10 20 30No 40 30 70

Total 50 50 100

the same before and after treatment. To compare the ranksums 697 and 2,706, one uses Wilcoxon's signed-rank test(20), where we find the following test statistic:

N(N + 1)

4

/N(N + 1)(2N + 1)

\/ 24

(82)(83)2706

4

(82)(83)(165)

24

2706 - 1701.5= 4.644

216.313

Under the null hypothesis of no change, this statisticshould follow a normal distribution. This statistic is greaterthan the 99.9995th percentile of the normal distribution.Therefore, we conclude that the median log bacterial countafter the administration of the drug is much less than thatbefore the drug (P < 0.00001).The above calculations, for simplicity, did not include a

correction to the significance test for tied ranks. The refer-ence gives formulas that adjust the test statistic for tiedranks, and these adjustments should be used. In practice,these calculations, as seen in this example, are tedious, andthe use of a well-tested computerized statistical package isrecommended.

Two Samples with Independent Observations

Nominal data. If samples of a standard pool of cytomega-lovirus are inoculated into shell vials and these vials are thenrandomly processed either with or without centrifugation,the groups of vials are independent of one another and theresults of the experiment can be displayed as in Table 5.

In contrast to Table 3, the object here is to compare thetwo proportions 10/50 and 20/50. O'Brien and Shampo (19)give a good explanation of how to analyze this type of tablefrom the point of view of the difference in proportionsrelative to the unknown common proportion. The chi-squaretest, however, is much easier to perform because of theexistence of a simple computing formula (20). In our exam-ple, the formula becomes

2 [(10)(30) - (20)(40)]2(100) 25,000,000

Xl (50)(50)(70)(30) 5,250,000

This value is >3.84, which is the 95th percentile of thechi-square distribution with 1 df, and we conclude thatcytomegalovirus grew proportionately more often when the

TABLE 6. Example of an independent ordinal table

GrowthTreated 1= 2= 3 = Total

None Minimal Moderate Confluent

No 20 15 10 5 50Yes 10 15 15 10 50

Total 30 30 25 15 100Avg rank 15.5 45.5 73 93

shell vials were centrifuged than when they were not centri-fuged (P < 0.05).

Continuous Gaussian data. If the underlying distribution ofthe study variable is approximately Gaussian and if thestandard deviations in the two study groups are of similarmagnitude, the two-sample t test (4) is the most powerfulstatistical test for detecting a shift in the centers of thedistributions (the means in the case of the t test). Almostevery personal computer and many hand calculators haveprogrammed versions of the t test, but before trusting theseprograms the user should try an example such as that givenin reference 4.

Continuous non-Gaussian data. Many distributions in mi-crobiology, such as colony counts and radioactivity counts,are markedly skewed, thereby violating the assumptions ofthe t test. Occasionally, a mathematical transformation ofthe underlying distribution, such as the logarithmic transfor-mation for non-negative and non-zero distributions, willallow the investigator to use the t test on the transformedscale and then make inferences back to the original scale.Many times, however, no simple transformation will yield adistribution that is sufficiently Gaussian such that the t testcan be used. In these cases the Wilcoxon rank-sum test (4) isgenerally applicable. The rank sum test can be more power-ful than the t test when the distributions are non-Gaussian,and little power is lost compared with the t test even whenthe distributions are Gaussian.

Ordinal data. One of the most frequent errors in medicalstudies is using a chi-square test on a two-way frequencytable when one of the variables is ordinal in nature. Moses etal. (13) point out that the correct analysis would use theWilcoxon rank sum test. Suppose that, instead of the depen-dent ordinal table given in Table 4, we now have twoindependent groups or cultures randomly treated with andwithout a growth-enhancing drug such as cycloheximide,and the response (growth of Chlamydia trachomatis) can becategorized only subjectively (in a blinded fashion) as 1 = nogrowth, 2 = minimal growth, 3 = moderate growth, and 4 =confluent growth. The results of such an experiment mightbe as found in Table 6.

Rank sum for no treatment = (15.5)(20) + (45.5)(15) +(73)(10) + (93)(5) = 2,187.5

Rank sum for treated = (15.5)(10) + (45.5)(15) + (73)(15) +(93)(10) = 2,862.5

A formula without correction for ties for a normal relativedeviate used to compare the two rank sums is given in Dixonand Massey (4):

- T - N1(N1 + N2 + 1)/2l + 0.5Z-

N1N2(N1 + N2 + 1)12

VOL. 3, 1990

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 6: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

CLIN. MICROBIOL. REV.

where T is the rank sum for the smaller sample size, N1 is thesmaller sample size, and N2 is the larger sample size. In thisexample, this becomes

2,862.5 - 50(101)/2 + 5

-\/(50)(50)(101)112

338.0= 2.33

145.057

This value is greater than the 97.5th percentile of the normaldistribution, and we conclude that there was greater growthin the treated group of cultures (P < 0.05). Once again, inpractice, a formula with correction for tied ranks should beused, and most computer statistical packages include appro-priate rank sum tests.

Three or More Samples with Dependent Observations

Detailed descriptions of methods for three or more sam-ples with dependent observations are beyond the scope ofthis article. The reader is encouraged to refer to examplesgiven in the references.When the same experimental unit is studied on more than

two occasions under different experimental conditions, theappropriate statistical methods become much more com-plex. If the response variable is a no/yes or positive/negativevariable, Cochran's Q test (20) may be used. If the responsevariable is Gaussian, repeated-measures analysis of variance(12, 21) should be used. Finally, if the response variable iscontinuous but non-Gaussian, or if it is ordinal, Friedman'sprocedure (20) should be used.

Three or More Samples with Independent Observations

When different experimental units are studied under threeor more experimental conditions, the following statisticalmethods should be considered depending on the nature ofthe response variable. If the response variable is nominal,chi-square methods (20) should be used. If the responsevariable is Gaussian, one-way analysis of variance (4) isappropriate. Finally, if the response variable is non-Gaus-sian or if it is ordinal, the Kruskal-Wallis one-way analysis ofvariance using ranks (20) should be used.

OTHER STATISTICAL METHODS

When the relationships of two or more continuous varia-bles to one another are of interest, correlation and regressionmethods are usually used. These methods are not addressedhere. The interested reader should refer to an excellent bookon this subject (6).

If the response variable of interest in a study is a no/yesvariable that is a function of time from some starting point,then survival or actuarial methods should be used. Forexample, if survival after treatment with two or more exper-imental drugs is being studied, the endpoint is a function ofhow long each patient is under follow-up. Methods forestimating and comparing such endpoints are found in abook by Lee (8).

EVALUATING NEW DIAGNOSTIC TESTS

Many articles have appeared in the medical literature onevaluation of new tests. Two articles written for physiciansare highly recommended as introductory references, one byMcNeil and Hanley in the New England Journal ofMedicine(10) and the other by Metz in Seminars in Nuclear Medicine(11). Another article by McNeil et al. (9) is recommended for

TABLE 7. Format for a negative-positive diagnostic test

True stateNew testresult Positive Negative Total

(D+) (D-)

Positive (T+) a b a + bNegative (T-) c d c + d

Total a + c b + d N= (a + b + c + d)

a more detailed analysis of receiver operating characteristic(ROC) curves.

This section is divided into two parts: one for an experi-ment in which the true status of a patient is known (forexample, diseased or not diseased) and the second for thesituation in which the truth is not necessarily known but theresults of another diagnostic test, considered to be goldstandard, are known.

True Patient Status Is Known

Suppose we are interested in whether a new diagnostictest accurately predicts whether a patient is infected with aparticular organism, and suppose that we always know thetrue status of the patient, supposedly from another test orexamination that never makes an error but perhaps isexcessively costly or time-consuming compared with theproposed new diagnostic test. In this example, there are twoseparate situations to consider: (i) the new test is eitherpositive or negative, and (ii) the new test is ordinal (perhaps0, +1, +2, etc.) or continuous in nature.

Negative or positive diagnostic test. When the new test iseither negative or positive, the results from a series ofpatient evaluations can be displayed in a simple two-wayfrequency table.

In Table 7, a represents the number of patients trulyinfected that were correctly called positive by the new test,b represents the number of patients truly noninfected whowere incorrectly called positive by the new test, etc. Severalindices of new test accuracy have been proposed for tablessuch as Table 7, but only three will be presented.

(i) Sensitivity = Pr(T+/D+) = a/(a + c). Sensitivity is theprobability (Pr) that the new test will be positive (T+) whenthe patient truly is infected (D+) and is estimated by theratio al(a + c).

(ii) Specificity = Pr(T-/D-) = dl(b + d). Specificity is theprobability that the new test will be negative when, in fact,the patient is not infected.

(iii) Positive predictive value = Pr(D+/T+) = al(a + b).The positive predictive value, sometimes called the "diag-nosibility" of the test, is the proportion of times that thepatient will, in fact, be infected when the new test is positive.For a new diagnostic test to be a "good" test, it is

desirable that the sensitivity and specificity be as high aspossible, preferably 90% or greater. Sensitivity and speci-ficity are independent of the prevalence of infection in thepopulation being studied. This is not true for positive pre-dictive value, which is highly related to the populationprevalence of infection, estimated by (a + c)/N. Tables 8 and9 highlight this problem.Both sensitivity and specificity are the same in Tables 8

and 9, but the positive predictive value is much lower inTable 9. This is because the prevalence of disease is muchlower in Table 9 (100/10,100 = 0.99% versus 100/200 = 50%).This problem is particularly apparent when the disease being

224 ILSTRUP

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 7: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

STATISTICAL METHODS IN MICROBIOLOGY 225

TABLE 8. Example of a negative-positive diagnostic testwith high positive predictive value

New test True stateresult ~~~~~~~~~~~~~Totalresult Positive (D+) Negative (D-)

Positive (T+) 95 10 105Negative (T-) 5 90 95

Totals 100 100 200

a Sensitivity = 95/100 = 95%. Specificity = 90/100 = 90%. Positivepredictive value = 95/105 = 90.5%.

studied is rare, such as infection with human immunodefi-ciency virus (editorial, Chance: New Directions for Statis-tics and Computing 1:9, 1988).

Ordinal diagnostic test. Table 10 is an example of a newdiagnostic test that is not simply negative or positive buttakes on ordinal values such as 0, +1, +2, etc.For this new test, four different sets of sensitivity and

specificity are possible, the values of which depend on whichnew test values one considers positive. For example, if 0 isconsidered negative and +1 through +4 are consideredpositive, then the sensitivity is (10 + 15 + 30 + 40)/100 =95/100 = 95%, and the specificity is 35/100 = 35%. If 0 or + 1are considered negative and +2, +3, and +4 are consideredpositive, then the sensitivity is (15 + 30 + 40)/100 = 85%,and the specificity is (35 + 20)/100 = 55%. In the samemanner, the other two possible sets of sensitivity andspecificity are (70%, 75%) and (40%, 90%).

If one plots sensitivity on the y axis and specificity on thex axis for the four possibilities from Table 10, one generatesthe curve displayed in Fig. 1. This is a ROC curve. It visuallydisplays the effect on sensitivity and specificity that resultsfrom more stringent or less stringent definitions of positivitywhen a new diagnostic test is evaluated. ROC curves dem-onstrate that, in general, "you cannot have your cake andeat it too" when determining cutoff values for positivity of anew test. To have very high sensitivity, specificity is sacri-

TABLE 9. Example of a negative-positive diagnostic testwith low positive predictive value

New test True stateresult ~~~~~~~~~~~~~Totalresult Positive (D+) Negative (D-)

Positive (T+) 95 1,000 1,095Negative (T-) 5 9,000 9,005

Total 100 10,000 10,100

a Sensitivity = 95/100 = 95%. Specificity = 9,000/10,000 = 90%. Positivepredictive value = 95/1,095 = 8.7%.

TABLE 10. Example of an ordinal diagnostic test

New test True stateresult ~~~~~~~~~~~~~~Totalresult Positive (D+) Negative (D-)

0 5 35 40+1 10 20 30+2 15 20 35+3 30 15 45+4 40 10 50

Total 100 100 200

1oo r

0

4

"W.<D 80

'a A

.a, D =

c C= (DCD a- 3.D c

40-= en(

.- -

(a e

O Q 20

0

0 20 40 60 80 100

SpecificityProbability that the new test is negativewhen the patient is not diseased, %

FIG. 1. Example of a ROC curve.

ficed and vice versa. It should be noted that many authorsplot (1 - specificity) on the x axis instead of specificity andthat either method is effective.

Plotting ROC curves is particularly useful when evaluatingmore than one new diagnostic test. The diagnostic test withhigher sensitivity in an acceptable range of specificity isusually chosen as the preferred test unless that test is toocostly or too dangerous to the patient.When, instead of a test with ordinal values as in Table 10,

a test is evaluated that has a truly continuous distribution,the ROC becomes even more useful. The distribution of newtest values for truly positive patients overlaps with thedistribution of test values for truly negative patients. Vary-ing the value of the new test above (or below) which patientsare classified as positive will generate a ROC curve in thesame manner as that for Table 10, but with infinitely morepoints. In practice, fortunately, it is not necessary to evalu-ate the infinitely many points, and the number of cutoffpoints is chosen such that the resulting ROC curve isreasonably smooth.

Unknown True Patient Status

Often the true status of disease of a patient is unknown.This usually happens when, historically, disease status hasbeen determined by another diagnostic test sometimesknown as the gold standard test. The gold standard test maynot and usually dose not have 100% sensitivity and speci-ficity, but it has been in use for a long time, investigators arefamiliar with it, and for better or worse its results areassumed to be accurate. In this setting, the concepts of thesensitivity and specificity of a new test are ill-defined. If youare willing to assume that the results of the old gold standardtest are true, sensitivity and specificity may then be defined,but the inherent errors of the gold standard may then beperpetuated. The new test might actually be more accuratethan the gold standard, but in this setting disagreement withthe gold standard would be considered false-positive orfalse-negative results. It is my opinion that the concepts ofsensitivity and specificity should not be used when, in fact,the true disease status of the patient is unknown. The

VOL. 3, 1990

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from

Page 8: Statistical Methods in Microbiology · value of a true underlying population characteristic, for example, the median time to culture positivity ofspecimens from patients infected

CLIN. MICROBIOL. REV.

agreement of the new test and the gold standard test shouldbe displayed in tabular or graphic form, and areas of dis-agreement should, if possible, be investigated by otherdiagnostic tests or minimally by following the patient todetermine ultimately whether the patient clinically developsthe disease. In some instances, whether a new test is betterthan an old test may never be known. Then the decision ofwhether to replace the old test with the new, or to use thenew test in combination with the old test, should be made byusing a cost-benefit analysis.

SUMMARY

Statistical methodology is viewed by the average labora-tory scientist, or physician, sometimes with fear and trepi-dation, occasionally with loathing, and seldom with fond-ness. Statistics may never be loved by the medicalcommunity, but it does not have to be hated by them. It istrue that statistical science is sometimes highly mathemati-cal, always philosophical, and occasionally obtuse, but forthe majority of medical studies it can be made palatable. Thegoal of this article has been to outline a finite set of methodsof analysis that investigators should choose based on thenature of the variable being studied and the design of theexperiment. The reader is encouraged to seek the advice ofa professional statistician when there is any doubt about theappropriate method of analysis. A statistician can also helpthe investigator with problems that have nothing to do withstatistical tests, such as quality control, choice of responsevariable and comparison groups, randomization, and blind-ing of assessment of response variables.

LITERATURE CITED1. Beyer, W. H. 1986. CRC handbook of tables for probability and

statistics, p. 286-289, 294, 2nd ed. CRC Press, Boca Raton, Fla.2. Cochran, W. G., and G. M. Cox. 1957. Experimental designs, p.

24-25, 2nd ed. John Wiley & Sons, Inc., New York.3. Cohen, J. 1977. Statistical power analysis for the behavioral

sciences, revised ed. Academic Press, Inc., New York.4. Dixon, W. J., and F. J. Massey, Jr. 1969. Introduction to

statistical analysis, p. 77-80, 116-118, 156-163, 344-345, 509,3rd ed. McGraw-Hill Book Co., New York.

5. Friedman, L. M., C. D. Furberg, and D. L. DeMets. 1985.

Fundamentals of clinical trials, 2nd ed. PSG Publishing Co.,Littleton, Mass.

6. Kleinbaum, D. G., L. L. Kupper, and K. E. Muller. 1988.Applied regression analysis and other multivariable methods,2nd ed. PWS-Kent Publishing Co., Boston.

7. Lachin, J. M. 1981. Introduction to sample size determinationand power analysis for clinical trials. Controlled Clin. Trials2:93-113.

8. Lee, E. T. 1980. Statistical methods for survival data analysis.Lifetime Learning Publications, Belmont, Calif.

9. McNeil, B. J., and J. A. Hanley. 1984. Statistical approaches tothe analysis of receiver operating characteristics (ROC) curves.Med. Decision Making 4:137-150.

10. McNeil, B. J., E. Keeler, and S. J. Adelstein. 1975. Primer oncertain elements of decision making. N. Engl. J. Med. 293:211-215.

11. Metz, C. E. 1978. Basic principles of ROC analyses. Semin.Nuclear Med. 8:283-298.

12. Morrison, D. F. 1976. Multivariate statistical methods, p. 141-153, 2nd ed. McGraw-Hill Book Co., New York.

13. Moses, L. E., J. D. Emerson, and H. Hosseini. 1984. Statistics inpractice: analyzing data from ordered categories. N. Engl. J.Med. 311:442-448.

14. O'Brien, P. C., and M. A. Shampo. 1981. Statistics for clini-cians. 1. Descriptive statistics. Mayo Clin. Proc. 56:47-49.

15. O'Brien, P. C., and M. A. Shampo. 1981. Statistics for clini-cians. 2. Graphic displays-histograms, frequency polygons,and cumulative distribution polygons. Mayo Clin. Proc. 56:126-128.

16. O'Brien, P. C., and M. A. Shampo. 1981. Statistics for clini-cians. 3. Graphic displays-scatter diagrams. Mayo Clin. Proc.56:196-197.

17. O'Brien, P. C., and M. A. Shampo. 1981. Statistics for clini-cians. 4. Estimation from samples. Mayo Clin. Proc. 56:274-276.

18. O'Brien, P. C., and M. A. Shampo. 1981. Statistics for clini-cians. 5. One sample of paired observations (paired t test).Mayo Clin. Proc. 56:324-326.

19. O'Brien, P. C., and M. A. Shampo. 1981. Statistics for clini-cians. 8. Comparing two proportions: the relative deviate testand chi-square equivalent. Mayo Clin. Proc. 56:513-515.

20. Siegel, S. 1956. Nonparametric statistics for the behavioralsciences, p. 63-75, 75-83, 104-111, 161-166, 166-172, 175-179,184-193. McGraw-Hill Book Co., New York.

21. Winer, B. J. 1971. Statistical principles in experimental design,p. 261-283, 2nd ed. McGraw-Hill Book Co., New York.

226 ILSTRUP

on Novem

ber 7, 2020 by guesthttp://cm

r.asm.org/

Dow

nloaded from