17
STATISTICS IN MEDICINE, VOL. 10, 1897-1913 (1991) STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s DOUGLAS G. ALTMAN Medical Statistics Laboratory. Imperial Cancer Research Fund, P.O. Box No 123, Lincoln's Inn Fields. London WCZA 3PX. U.K. SUMMARY This paper reviews changes in the use of statistics in medical journals during the 1980s. Aspects considered are research design, statistical analysis, the presentation of results, medical journal policy .(including statistical refereeing), and the misuse of statistics. Despite some notable successes, the misuse of statistics in medical papers remains common. 1. INTRODUCTION Although the widespread use of statistics in medical research is a relatively recent phenomenon, concern has been expressed over the misuse of statistics in papers published in medical journals for at least 60 years. Early examples are Dunn,' who felt that half of a series of papers examined contained statistical errors, and Greenwood,* who made a general observation about the prevalence of statistical errors. Despite such isolated examples there was little serious investigation of the use of statistics in medical journals until the mid-1960s. However, in the 25 years or so since the important study of Schor and Karten3 there has been a massive number of studies of the use of statistics, with over 150 published by 1986.4 There have been occasional similar reviews in other fields, such as ecology, ornithology and agriculture (where concern seems to be largely restricted to the misuse (and over-use) of multiple comparison procedures), but nothing to compare with the huge number of studies of the medical literature. It is unclear why this should be, but it does not necessarily indicate that the standard of statistics is lower in medicine than in other fields, although this must be a possibility. It is symptomatic of the concern about the state of statistics in medicine that a recent book is devoted to describing methodological error^.^ The last 25 years have brought not only an increase in the use of statistics in medicine and evaluations of the misuse of statistics, but a large increase in the number of statisticians working in medical research. It is generally felt that the standard of statistics in medical journals has improved over this period, but a survey of all the reviews up to 1986 does not offer much support for this idea.6 One explanation is that the type of statistical analysis has also changed, mainly because of easy access to computers. In particular, multiple regression and multivariate methods are now within reach of anyone with access to a personal computer and relatively inexpensive statistical software. Doctors no longer need to consult a statistician for help with any analysis that they cannot perform using a calculator. The growth of medical statistics as a branch of statistics can be seen by the founding of two specialist journals at the start of the decade (Controlled Clinical Trials in 1980 and Statistics in 0277-671 5/91/121897-17$08.50 0 1991 by John Wiley & Sons, Ltd.

Statistics in medical journals: Developments in the 1980s

Embed Size (px)

Citation preview

STATISTICS IN MEDICINE, VOL. 10, 1897-1913 (1991)

STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s

DOUGLAS G. ALTMAN Medical Statistics Laboratory. Imperial Cancer Research Fund, P.O. Box No 123, Lincoln's Inn Fields.

London WCZA 3PX. U.K.

SUMMARY This paper reviews changes in the use of statistics in medical journals during the 1980s. Aspects considered are research design, statistical analysis, the presentation of results, medical journal policy .(including statistical refereeing), and the misuse of statistics. Despite some notable successes, the misuse of statistics in medical papers remains common.

1. INTRODUCTION

Although the widespread use of statistics in medical research is a relatively recent phenomenon, concern has been expressed over the misuse of statistics in papers published in medical journals for at least 60 years. Early examples are Dunn,' who felt that half of a series of papers examined contained statistical errors, and Greenwood,* who made a general observation about the prevalence of statistical errors.

Despite such isolated examples there was little serious investigation of the use of statistics in medical journals until the mid-1960s. However, in the 25 years or so since the important study of Schor and Karten3 there has been a massive number of studies of the use of statistics, with over 150 published by 1986.4 There have been occasional similar reviews in other fields, such as ecology, ornithology and agriculture (where concern seems to be largely restricted to the misuse (and over-use) of multiple comparison procedures), but nothing to compare with the huge number of studies of the medical literature. It is unclear why this should be, but it does not necessarily indicate that the standard of statistics is lower in medicine than in other fields, although this must be a possibility. It is symptomatic of the concern about the state of statistics in medicine that a recent book is devoted to describing methodological error^.^

The last 25 years have brought not only an increase in the use of statistics in medicine and evaluations of the misuse of statistics, but a large increase in the number of statisticians working in medical research. It is generally felt that the standard of statistics in medical journals has improved over this period, but a survey of all the reviews up to 1986 does not offer much support for this idea.6 One explanation is that the type of statistical analysis has also changed, mainly because of easy access to computers. In particular, multiple regression and multivariate methods are now within reach of anyone with access to a personal computer and relatively inexpensive statistical software. Doctors no longer need to consult a statistician for help with any analysis that they cannot perform using a calculator.

The growth of medical statistics as a branch of statistics can be seen by the founding of two specialist journals at the start of the decade (Controlled Clinical Trials in 1980 and Statistics in

0277-671 5/91/121897-17$08.50 0 1991 by John Wiley & Sons, Ltd.

1898 D. G. ALTMAN

Medicine in 1982) and another due to appear in 1992 (Statistical Methods in Medical Research). Further, Statistics in Medicine has increased enormously in size during the decade, with volume 9 being more than three times the size of volume 1. The increasing proportion of ‘medical’ papers published in Biornetrics was noted by Armitage,’ these being 25 per cent of published papers in 1971-1975 and 45 per cent in 1976-1981.

In this paper I will consider changes over the last decade in statistical methods employed in published papers and in the policies of medical journals. I will then consider changes in the misuses of statistics. Lastly I will offer some suggestions for future progress.

2. USE OF STATISTICS

The strongest evidence about changes in the use of statistics in medical papers would come from reviews of the same journals at both ends of the time period. Although there have been several statistical ‘content analyses’ published in the last ten years, most have examined one or more journals in a single short time period. Further, there is as yet little published information about the last few years, as a result of the inevitable time taken to perform and publish such studies.

2.1. Reviews of the literature

One of the few studies of more than one period compared papers published in Arthritis and Rheumatism in 1967-1968 and 1982.8 I t thus indicates the changes in the period just before the last decade. The proportion of papers using statistics rose from 50 per cent (47/94) to 62 per cent (74/119). They found that the proportions of these papers using t and chi squared tests had risen from 17 per cent to 50 per cent and from 19 per cent to 30 per cent, respectively. For linear regression the increase was from 2 per cent (one paper) to 24 per cent. The proportion of papers using more than one statistical technique rose from 9 per cent to 41 per cent.

In order to provide some more recent quantitative information I carried ou t a survey of the statistical methods used in the first 100 ‘Original Papers’ published in the New England Journal of Medicine in 1990. I have compared the findings with those of Emerson and Colditzg who studied the same journal in 1978-1979. Table I shows comparable data from the two surveys, using the groupings of Emerson and cold it^.^ (It also shows their figures for all articles.) The proportion of papers containing nothing beyond descriptive statistics fell from 27 per cent to 11 per cent. The use of simple methods remained fairly constant, but there was a doubling of the use of linear regression and non-parametric methods and a dramatic surge in the use of more complex methods. The most notable change was in the use of survival analysis (including logistic regression), notably multiple logistic regression and Cox regression. Among the 100 papers in my sample, 27 used these techniques. There was also a marked increase in the use of methods not included in any of the categories. It is clear from Table I that there has been a tendency for an increase in the number of different techniques used per paper, although this cannot be quantified precisely because papers may include more than one method in one grouping.

Clearly the New England Journal of Medicine is not representative of all journals. It is probable that papers published in most specialist journals are less statistically weighty on average. However, I am fairly sure that qualitatively similar changes have occurred over the period in journals of all sorts. I have refereed papers for one specialist medical journal for the last eight years, and my impression is that there has been a trend towards more complexity (or should I say sophistication?) in the statistical methods used. The trend is therefore to an increased use of methods not usually taught to medical students and possibly not even on postgraduate courses.

STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s 1899

Table I. Proportion of papers in New England Journal of Medicine using certain statistical methods of analysis in 1978-19798 and 1990.

1978-1979 1978-1979 1990 Original Original

All papers papers papers ( n = 760) ( n = 332) ( n = 100)

Procedure O h YO O h

No statistical method or descriptive statistics only 58 27 11 t test 24 44 39 Contingency tables 15 27 30 Pearson correlation 7 12 17 Non-parametric tests 6 11 25

Regression for survival or logistic regression 1 27 * 19 Life-table 3 Other survival analysis 1 15

Epidemiological statistics 5 9 13 Simple linear regression 5 8 18 Analysis of variance 4 8 14 Tiansformation 3 7 8 Multiple regression 3 5 6

Any survival analysis or logistic regression * 11 32 *

*

Non-parametric correlation 2 4 9 Multiway tables 2 4 7 Multiple comparisons 2 3 5 Adjustment and standardization 2 3 1 Other methods 2 3 19

Full explanation of all the above categories is given by Emerson and Colditz' * Not given

Papers now report larger numbers of analyses than previously, yet the use of methods that aim to control the type I error rate is rare."^"

As a further approach to evaluating changes in the use of statistics over the decade, I investigated numbers of citations of some well known, and frequently cited, statistical publi- cations, using the Science Citation Index for 1980 and 1989. The number of citations for Peter Armitage's 'Statistical Methods in Medical Research"' for 1980 and 1989 were 293 and 553. For 1989 I also counted references to the second edition.13 For the paper by Pet0 et describing the logrank test there were 170 and 334 citations, almost all being from papers in medical journals. Thus for both a general text and a specialized paper there has been a large increase in citations, suggesting a corresponding increase in the use of statistical methods beyond the level of t , r and xz, simple methods which are rarely referenced (Andersen' reports a similar investigation of the seminal paper by COX'^). The increase may be larger than these figures suggest, because of the ever wider choice of books and papers to cite. On the other hand, there has been an increase in the number of journals, and hence papers, published.

2.2. New developments in analysis

Apart from the aspects already mentioned, one of the most notable developments has been the new industry of meta-analysis for combining data from several similar studies. Although mostly used for controlled trials, the approach is increasing in epidemiology.' ' v l 13 There are different schools of thought about the right statistical approach, but general agreement that a quantitative

1900 D. G. ALTMAN

u C al 3 U al L L L

Year

Figure I . Number of papers referring to meta-analysis - Medline search for 1980-1989

synthesis of all available information is better than the more common subjective review of a selected subset. Several journals have carried editorials or articles explaining the principles,19- 22

and there has been a large increase in published meta-analyses over the last few years as is shown by the number of papers using the term (Figure 1). The most remarkable example of the use of meta-analysis is 'Efective Cure in Preynuncy and Childbirth',23 a huge book ( > 1500 pages) containing hundreds of meta-analyses performed in a standard manner on all known randomized trials in perinatal medicine.

A widespread change has been the movement towards the use of confidence intervals. It is unclear why earlier calls for their had no impact, but in the mid 1980s the publication of papers in several leading journal^^^-^^ led to a radical shift of emphasis towards including confidence intervals in published papers. Before this time confidence intervals were rarely seen in medical journals, but 42 per cent of the 80 trials from 1987 studied by Altman and Dore3' included them, as did 33 per cent of the papers I examined in the New England Journal of Medicine in 1990. As yet there has not been much of a move towards excluding P values, but that is an obvious possibility for the f ~ t u r e . ~ ' As Bradford Hill recently commented,32 there is more chance of taking the medical community with us if we introduce new statistical ideas gradually. In this context, the endpoint may be Bayesian methods (but it may not3'). Bayesian methods remain extremely rare in medical journals, except (often implicitly) in studies of diagnosis. However, there is increasing interest in empirical Bayes approaches.34- 3 7

2.3. Design

Despite the continuing publication of reviews of clinical trials highlighting errors,' 1 * 3 0 *

I believe that the last decade has seen an improvement in the methodology, analysis and reporting of clinical trials. Methods of randomization and sample size calculations are more often reported, and intention to treat analyses are now common. However, despite improvement the current standard is still by no means adequate. Altman and DorC3' found that prior sample size

STATISTICS IN MEDICAL JOURNALS DEVELOPMENTS IN THE 1980s 1901

calculations were mentioned in 39 per cent of 80 published trials, a much higher proportion than in previous reviews. However, closer examination showed that in very few of these was it possible to reproduce the calculations from the information provided. Even the calculated sample size was often not given. Despite exhortations for large, simple trials,j9 the median number of patients in published trials remains small. There has been an increase in the use of group sequential methods, and perhaps of fully sequential methods too, but neither is at all common yet. Trials which stop early as a result of interim analyses may be c o n t r o ~ e r s i a l . ~ ~ ~ ~ ~

Most clinical trials are designed along conventional lines. Variations in trial design are slow to gain acceptance. Of particular interest are methods of adaptive allocation, notably the ‘play-the- winner’ r ~ l e , ~ ~ * ~ ~ and randomized consent designs.44 Adaptive designs appear to have been used only very rarely. Two trials of extracorporeal membrane oxygenation (ECMO) that used both the play-the-winner rule and randomized c o n ~ e n t ~ ” ~ ~ have been highly controversial (Bartlett et

claim that theirs was the first clinical study to use an adaptive design). There is a fascinating discussion of these trials by Ware and others4’ It is not likely that these trials have furthered the cause of these design^.^'

Zelen’s randomized consent designs44 have been used somewhat more often, although still only rarely.49 These too have provoked much debate, on both statistical and ethical grounds.50 Zelen’s prediction’ ’ that ‘these designs will dominate the field of randomized controlled clinical trials’ seems unlikely to be fulfilled.

Special problems have arisen in relation to the design of clinical trials of treatments for AIDS.52 By contrast to these variations in design, the use of minimizations3 seems widely accepted as a

valid method of treatment allocation, although its use is not common. Problems with clinical trials have led to repeated suggestions that more use could be made of databases to compare treatmentss4 However, there are serious potential biases with this approach.ss* s6

Although clinical trials represent a small proportion of published papers, there have been few reviews of other types of study. My impression is that there has been some general improvement in design. In particular, the need for a control group seems widely recognized even outside the field of controlled trials. This is an area where some literature reviews would be valuable.

3. MEDICAL JOURNAL POLICY

Despite a lack of quantitative information, I have the clear impression that the 1980s have seen a further increase in the number of papers on statistical methods published in medical journals. Series of articles have appeared in the New England Journal of Medicine, British Medical Journal, Journal of Clinical Pathology, British Dental Journal and Mayo Clinic Proceedings, and several others. Some of these have been republished as books. There are also many one-off papers on selected topics. Articles of this nature may appear in almost any medical journal. Journals also quite frequently carry editorials on statistical issues. Topics covered most often in recent years are probably m e t a - a n a l y s e ~ ’ ~ * ~ ~ ~ ~ ~ and confidence intervals.289s7*s8 (In each case the references cited are just a few of the many editorials I am aware of.)

Many medical journals seem willing to publish such articles, and also letters criticizing the use of statistics in papers they have published. I noted in the first issue of Statistics in Medicine that most journals gave much more attention to the format of references in submitted papers than they gave to the statistical content.” This remains true. I also argued that there was a need for statistical guidelines for contributors to journals, and since 1982 several sets of guidelines have been published for clinical trials,60-62 epidemiological s t ~ d i e s , ~ j - ~ ~ the evaluation of diagnostic and screening meta-analyses6’ and more detailed general guideline^.^^ The Interna- tional Committee of Medical Journal Editors” has included a somewhat expanded section on

1902 D. G. ALTMAN

statistics in the third edition of the 'Vancouver' guidelines. However, this revised section is still too brief, requiring clarification and amplification by Bailar and M~ste l le r .~ ' In addition, sets of check lists produced to help statistical referees7' can also assist contributors and those wishing to assess published papers. Some of the papers giving guidelines also include check lists.60*65*67 There is, of course, no information available about the use of these guidelines by authors. At least authors can be directed towards such guidelines when asked to revise their manuscripts.

Journals have different styles of presentation of statistical material. There remains no con- sensus on whether to use P, p, P or p values. While not of major importance, a standard notation would be desirable. Most journals continue to use the & notation to connect means and either standard deviations or standard errors despite frequent ambiguity and the misleading implica- tions of such n ~ t a t i o n . ~ ~ . ~ ~ Several journals no longer allow this usage.

The only type of statistical analysis that I know to have been banned is the comparison of the survival of patients who do or do not respond to treatment. This type of analysis, which is grossly misleading (see, for example, references 75 and 76), is either not allowed or strongly discouraged in some cancer journals.60-77 However, it can still be seen in other journals.78 Analyses that compare the survival of patients with respect to other information not available at the start of the follow-up period, such as toxicity or compliance with treatment, make the same error, but may still appear in these journals, however.79

As mentioned earlier, many journals have published editorials and papers encouraging the use of confidence intervals. The British Medicul Journal requires them to be used when a p ~ r o p r i a t e ~ ~ and this stance has recently been adopted by the Danish Medical Bulletin." As yet it seems that editorial enthusiasm for confidence intervals is largely restricted to clinical journals - in pharmacology, physiology, immunology and so on the P value culture remains largely intact.'' Another recent development has been the structured abstract," which is now required for clinical trial reports by several journals, and is seen increasingly for other types of study. Structured abstracts should provide the reader with a clear indication of the design of a study, the main outcome of interest, and the main results. While they cannot eliminate bias in which results are reported in an they should reduce it, partly by making it easier for referees to judge the fairness of the abstract.

In my 1982 paper59 I commented on rumours that some journals had a policy of accepting only papers in which the results were statistically significant. I subsequently discovered an editorial stating just such a policy,83 which tied in well with Sterling's almost contemporary observation that virtually all papers in four psychology journals presented statistically significant findingsa4 There has been increasing evidence over the last ten years that the medical literature is subject to publication bias, whereby studies with statistically significant findings are more likely to be published than those with non-significant Although there is some evidence that authors may be more to blame than editors,86 some editors have continued to show signs of di~crimination.'~ I had thought that views as extreme as Melton's were a thing of the past, but in 1990 an editorial by a journal editor put forward very similar ideasa8 Newcombe has suggested that refereeing should be a two-stage process, in which the referee first passes judgement on the methods of a study, and decisions on publication are made independently of the actual results ~b ta ined . '~ I t would be good to see an experimental evaluation of this suggestion. Less realistically, it has also been suggested that the original protocol be submitted with manuscripts describing resultsg0 to ensure that the study complied with the original intentions.

3.1. Statistical refereeing

The idea that the use of statistical referees should improve the statistical quality of published papers is self-evident, but there is little evidence to support it. One early study3 did provide some

STATISTICS IN MEDICAL JOURNALS DEVELOPMENTS IN THE 1980s 1903

evidence that statistical refereeing is beneficial, and a recent study provides some further evidence.” Of 45 published papers examined, only 5 (11 per cent) were considered statistically acceptable as originally submitted compared with 38 (84 per cent) of the published papers. This is an area requiring further research.

George92 carried out an important survey of the attitudes and policies of medical journal editors with respect to statistical refereeing. Among the 83 journals supplying information, in 75 per cent the decision about whether a statistical review was needed was taken by the editor. In 7 per cent statistical opinions were rarely or never sought. Only 16 per cent had a policy that guaranteed a statistical review, 35 per cent of the journals had a statistical consultant or a statistician on the editorial board, and 12 per cent had published their policy on statistical reviewing. George’s recommendations9* (similar to those of Altman”) were that:

(a) All papers should be reviewed by a statistician prior to publication (perhaps only after a

(b) Journals should recruit statistical reviewers; (c) The statistical reviewers should at least be offered the option to see the revised manuscript; (d) Journals should publish their policy on statistical review; (e) Journals should adopt written standards or guidelines for statistical reporting (usually

favourable subject-matter review);

previously published guidelines).

Few journals score as many as two out of five. In my experience a key element in the successful implementation of statistical refereeing is for the statistician to produce a report in which all criticisms are numbered, and for the journal to require that the authors respond in writing to each point. (This approach need not be restricted to statistical referees.) It is usually simple for the referee to assess from the authors’ response whether they have understood and implemented the changes, or why they have not done so (for example, the referee misunderstood some aspect of the study). Often the editor will be able to judge the authors’ response but it is preferable for the statistician to see it. In a similar spirit, Altman and Dore3’ suggested that the authors might submit a check list with their papers, indicating where certain key items had been discussed or presented. I believe that statistical refereeing is an important role of the medical statistician, and one that can be educational. However, at present little or no credit is given for work of this kind, which must be squeezed into an already overcrowded schedule. Changes in the nature of statistical analyses over recent years mean that statistical refereeing of medical papers is more difficult than it was, and thus takes longer.

A major problem remains the lack of statistical understanding by referees and especially editors. Given that few journals can have all papers read by a statistician (which is ideal), editors do not have the expertise to judge which papers need a statistical opinion. It is depressing to see basic statistical flaws in papers published in a journal to which one is a statistical adviser, especially when one is credited as such at the front of the journal. Editorial staff could do more to ensure the full reporting of methods and reasonable styles of numerical and graphical pre- sentation. To take a few recent examples I am aware of in major journals, it is not acceptable to describe a clinical trial as randomized when allocation was ~ y s t e m a t i c , ~ ~ . ~ ~ to include scale breaks in a h i ~ t o g r a m , ~ ~ to summarize statistical methods in just five words ‘Conventional statistical methods were used’,96 to give correlation coefficients to six decimal places9’ or mean f SD intervals that include impossible negative value^.^^.^^ I t does not take much statistical knowledge to detect errors such as these.

Another approach to improving the use of statistics in published papers is for journals to require authors to indicate who was responsible for the statistical analysis. Authors submitting papers to the Journal of the American Medical Association and other AMA journals (for example,

1904 D. G. ALTMAN

Archioes of Neurology) have been required to provide this information for several years. While this will not in itself improve statistics, it is another way of reinforcing its importance. (We may, however, wonder how many authors supply the information, given the common disregard for instructions to authors.)

4. MISUSE OF STATISTICS

4.1. Reviews of the literature

I have already observed that most reviews of the statistical methods used in medical journals are cross-sectional, and the same is true of studies of the quality of the statistical analyses. One comparative review that gives some valuable pointers to recent trends was the comparison of papers published in Arthritis and Rheumatism in 1967-1968 and 1982.8 Statistical errors were found in 60 per cent of papers in 1967-1968 and 66 per cent in 1982. However, this small difference masks some major changes, as Table I1 shows. There were far fewer unidentified statistical analyses in 1982, but there had been a huge increase in errors related to multiple testing. As the authors note, however, an increase in the frequency of errors may disguise a fall in the proportion of papers which make an error using a certain technique (the error rate). They note that while the use of multiple t tests to compare three or more groups had increased sixfold (Table 11), the error rate had fallen from 2/2 to 18/26. It would be valuable to have more such information. Unfortunately, most reviews of the literature provide either the frequency of use of certain methods or the frequency of specific errors. In general reviews have found statistical errors in about half of published papers.6 It is likely that, despite the consistent error frequency of around 50 per cent, the error rates are in fact dropping. We have no direct evidence to this effect, but it seems probable given that there has been an increase in the amount of statistical analyses in papers yet the proportion of papers with errors has stayed at around the same level.

Gertzsche' ' reviewed 196 published reports of trials of non-steroidal anti-inflammatory drugs published between 1959 and 1984. He gave a fascinating catalogue of errors in these papers. He found significant tendencies for later trials to score worse for design and better for analysis, but in each case the differences were slight. For the trials published in 1980-1984 the median score for analysis was three out of a possible eight. A different picture is seen in a review of 103 controlled trials of X-ray contrast media.'" As Table 111 shows, mean scores increased considerably between the 1960s and 1980s, although the range of scores widened. Liberati et a/."' used multiple regression to investigate possible influences on the quality of 63 randomized trials of treatment for primary breast cancer published over the same period. They found a statistically significant (although slight) improvement over time. They also found that the presence of a biostatistician in the research team was associated with a large increase of 11 (95 per cent CI: 2.6 to 20.1) in the mean score (papers were given scores out of 100).

We might reasonably consider that the state of statistics in medical journals at the end of the 1980s is of more concern than possible changes over the decade. One of the most recent reviews was of 80 clinical trials published in the Annals of Internal Medicine, British Medical Journal, Lancet and New England Journal of Medicine in 1987," and was mainly concerned with randomization and baseline comparability. For 30 per cent of trials there was no clear evidence that the trials had truly been randomized. In 10 per cent of trials comparison of changes from baseline in the two treatment groups were via comparison of within group analyses, that is, comparison of P values, rather than between group comparison of changes.'O* Gstzsche' ' also found this serious error in 10 per cent of the trials in his sample. Altman and Dore3' found that confidence intervals were used in 42 per cent of the papers, but in a third of these the confidence

STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s 1905

Table 11. Statistical errors in papers published in Arthritis and Rheumatism'

Error 1967-68 1982 ( n = 47) ( n = 74)

Unidentified statistical method 30% 9 yo Inadequate description of measures of location and dispersion 13% 9 yo Two groups compared on > 10 variables at the 5 per cent level 6 Yo 38% Multiple t test used to compare three or more groups 4 yo 24 yo

Table 111. Quality of 103 trials of X-ray contrast media"'

Decade Number of Mean score Range trials (out of 22)

1960s 1970s 1980s

20 9.1 5-13 34 12.3 7-17 49 15.4 5-2 1

intervals were inappropriately given for individual group estimates rather than for the difference between groups.

Although no formal review was made of the statistical methods used in the 80 trials, during several Leadings of each paper many errors and peculiarities were noted among papers in all the journals. Some of these were:

(i) One-sided power calculation but two-sided analyses (in three papers); (ii) Use of two sample t tests 'when the assumptions for analysis of covariance were not met'; (iii) Power calculation based on an endpoint not analysed; (iv) The trial was stopped early (before the planned sample size) but no explanation was given; (v) Large numbers of post-randomization exclusions; (vi) Survival curves shown as frequencies (not proportions) for unequal size groups; (vii) Results presented only for the two treatment groups combined; (viii) Analysis of side-effects by the frequency of complaints rather than the number of affected

individuals (the authors thus failed to note a highly significant difference); (ix) The argument that a non-significant result was a type I1 error due to small sample size; (x) A confidence interval for a proportion with a negative lower limit; (xi) A confidence interval not including the point estimate; (xii) A one-sided confidence interval; (xiii) Intention to treat analysis not reported in abstract; (xiv) Univariate analysis presented in abstract (P = 0.04) rather than adjusted Cox analysis

(P = 0.1);

In addition, one trial described as randomized was excluded from the study because systematic allocation had been used.

This is not the standard readers expect from leading medical journals. It is likely that the standard of statistics would be no better, and quite possibly would be worse, in less illustrious journals. Also, clinical trials are probably in general of a higher statistical standard than other studies. The frequency of biased reporting in abstracts (see also References 11, 38) supports the ideas behind the introduction of structured abstracts.82

1906 D. G. ALTMAN

Despite clear signs of improvement in several respects, there are still far too many errors slipping through the net. I hope that most of the above errors were in papers not seen by statistical referees, but I am sure that some must have been. Guidelines for statistical referees do not exist as such, although some statisticians have written about their d i f f i c u l t i e ~ . ' ~ ~ * ' ~ ~ The British Medical Journal's check lists72 were designed to help statisticians referee papers, by forcing them to consider specific features. Otherwise such attention to detail is usually only made by those carrying out reviews of published papers.

It is my impression that the trends noted by Felson et al.' have continued throughout the 1980s. The main problems in analysis stem from too many analyses -multiple endpoints, multiple time points, multiple subgroup analyses, and so on. This is a clear symptom of the increased availability of computers but it may also be a symptom of the never-ending drive towards P < 0.05 at all costs. The obsession with significant P values is seen in several other ways:

(i) Reporting of significant results rather than those of most importance (especially in abstracts);

(ii) The use of hypothesis tests when none is appropriate (such as for comparing two methods of measurements or two observers);

(iii) The automatic equating of statistically significant with clinically important, and non- significant with non-existent;

(iv) The designation of studies that do or not 'achieve' significance as 'positive' or 'negative' respectively, and the common associated phrase 'failed to reach statistical significance'.

A review of 142 papers in three general medical journals found that in almost all cases (1076/1092) researchers' interpretations of the 'quantitative' (that is, clinical) significance of their results agreed with the statistical ~ignificance."~ Thus across all medical areas and sample size P rules, and P < 005 rules most. It is not surprising if some editors share these attitudes, as most will have passed through the same research phase of their careers and some are still active researchers.

4.2. Conceptual errors

The need for statistical refereeing was mentioned earlier. The papers which really need to be seen by a statistician are not necessarily those containing the most complex statistical methods. While we should certainly be concerned about the use of inappropriate methods of analysis, some serious errors are more subtle. An example is the problem of relating change to initial value. There is a large literature on this topic, in both statistical and medical journals, over more than 30 years.59s - lo' Yet the basic flawed analysis that produces the correlation coefficient between change of time and initial value is still common.

Musso et al.Io9 obtained a correlation of r = 098 (P < O.OOO1) between change in free plasma norepinephrine after haemodialysis and baseline levels in nine patients. Boer' l o pointed out the authors' mistake, although he did not give a full explanation. Musso et a1.l" rejected the criticism, leaving non-statistical readers of Nephron (presumably almost all readers) either unclear about who was right or believing that the criticism was unfounded. Even if a further letterLL2 clarifies the situation, the lesson is surely that more effort is needed to prevent papers with important statistical errors being published. A statistical referee could have intercepted the original paper before publication, and insisted on a valid analysis. This case illustrates why correspondence columns are an inadequate safeguard against misuse of statistic^.^^

STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s 1907

One of the arguments put forward by Musso et a!.' ' ' in favour of their analysis was that 'the data management we used is widely accepted in important scientific journals', and they cited two of their earlier papers. It is certainly true that analyses relating change to initial value are quite often published, but that does not make the method valid. This case thus also illustrates the potential 'copycat' effect of publishing erroneous statistical analysis.' '

The example just discussed is related to regression to the mean, a topic that surfaces in many guises and still causes important errors. A recent paper illustrates one type of problem. Five obstetricians each re-evaluated the case notes of 50 women in labour who had had an emergency caesarean section for foetal distress.'14 In 30 per cent of cases at least four of them felt that the caesarean had been unnecessary. This study was widely reported in the media, with headlines such as '30 per cent of caesareans unnecessary'. However, this conclusion is invalid because the design of their study was fatally flawed - the researchers should either have studied an unselected group or at least taken a comparison group where no caesarean section was carried out despite foetal distress.' ''

Other conceptual errors include multiple counting of the same individuals, for example by analysing joints or teeth rather than individuals, and the comparison of survival of responders and non-re~ponders~'. 7 6 (mentioned earlier). Examples of these and many other errors are given by Andersen.'

5. DISCUSSION

The introduction of statistical methods into medicine has been a gradual, evolutionary process. Perhaps deliberately, statisticians have often tried to introduce new ideas slowly. For example, Bradford Hill's original articles in the Lancet recommend alternate rather than random allo- cation.''6 As he recently observed,32 he felt that this was a more acceptable idea at that time. (It is unbiased in principle, but unlikely to be so in practice.) It was 11 years before the first randomized trial was published.' l 7 Another example is the wide recommendation to present confidence intervals as well as P values rather than instead of them. Omitting P values may come next.3'

There are more important reasons for changes in the use of statistics in medicine over time - new methodology, more statisticians, and cheap and easily accessible computers and software. These have all led to more statistical analyses, and more complex analyses, which in turn have thrown up other difficulties such as those arising from multiple comparisons and subgroup analyses. The widespread introduction of confidence intervals has led to a number of new errors in the way that these are calculated and presented. It is to be expected, therefore, that studies of the use and misuse of statistics will show changes with time, even over as short a period as ten years. Changes in line with the above developments were noted in the 1 9 7 0 ~ , ~ and have continued during the 1980s. Despite improvements in some areas it is clear from both formal and informal reviews of the literature that the misuse of statistics in medical journals is still far too common.

Almost ten years ago I considered the means whereby the misuse of statistics might be r e d ~ c e d . ' ~ I suggested that long-term changes that could be hoped for include improvements in the statistical education of medical researchers, statistical representation on ethics committees, and a general attempt to persuade the medical establishment of the necessity of adequate statistical input. I fear that there has been little progress in these areas, at least in the U.K. The shortage of statisticians in U.K. medical schools remains a major worry.'" There are strong arguments for increasing the number of statisticians in medical research.' l 3 They would not eliminate statistical errors, but their direct and indirect influence should certainly be of major benefit to the quality of medical research and thus the quality of published papers.

1908 D. G. ALTMAN

Fortunately, more progress is evident in the areas I identified as being amenable to short-term progress. As noted earlier, there does seem to have been an increased awareness of the importance of statistics by editors of medical journal^,^^*"^^'^^ although most journals still appear not to have any regular statistical refereeing system. It would be interesting now to repeat the survey of edit01-s.~~

Ten years ago there was a dearth of published statistical guidelines for authors, and I suggested that these would be valuable.59 Several sets of guidelines have now been p ~ b l i s h e d ~ ~ - ~ ~ - " but it is impossible to judge whether they are read or acted upon by potential authors. A cynic would doubt that either occurs often, and statisticians are noted cynics when it comes to such matters. While it is likely, for several reasons, that medical researchers tend to understand more about statistics than they did ten years ago, it is unlikely that this increase has kept pace with the dramatic changes in the statistical methods being used in medical papers, as evidenced by the data in Table I. Computer packages make these methods easily available, but reliance on the accompanying documentation for insight into their purpose, assumptions, and proper applica- tion is most unwise."3*121 I a m not aware of any evidence that the advanced methods are being used where they are inappropriate, but I certainly have misgivings when I see a paper containing sophisticated logistic or Cox regression analyses with no indication of any input by a statistician. While I do not believe that only statisticians should be allowed to use such methods, I am sure that there should be more statisticians in medical research institutions. Statistics is fundamental, not incidental, to medical research, and should be accorded a proper place in the medical world. This was recognized by the Medical Research Committee (later to become the Medical Research Council) in its first 'Scheme of Research' as long ago as 1913.'22 However, while the medical community frequently agree that statistics is a 'Good Thing', they are not in general willing to commit resources to statistics. This shortage is seen particularly in U.K. medical schools.' Many controversies in medicine can be traced to statistical issues (References 123-125 are just a few examples).

Improving the statistical understanding of medical researchers must still be high on statis- ticians' agenda.'13 While the 1980s have seen some signs of progress, there have also been some counterbalancing changes for the worse. The continued publication of misleading introductory statistics textbooks may have been a contributory factor,' 13,121 although there does seem to have been a recent increase in the number of good books on the market.

My suggestions for the future unfortunately largely echo those made at the beginning of the decade.59 I believe that there is still considerable scope for improved quality control in medical journals. It is impossibly idealistic to hope that we can stop the misuse of statistics, but we can apply a tourniquet to the lifeblood of the researcher - publication - by continuing to press journals to improve their ways. Errors in published papers should be pointed out in letters to editors as should the need for more statistical referees. Journals seem willing to publish these. Likewise, the (lack of) quality of statistics in published papers is highlighted in formal reviews. Again journals are willing to publish these, and in many cases have changed their policy as a direct result of adverse findings. It would be valuable if future reviews compared two (or more) eras, and considered the frequency of both the use and misuse of statistical methods, thus allowing error rates to be calculated.

At present the medical statistician may get little or no credit for statistical consultancy let alone work for journals. This situation needs to change. It is as ridiculous for the career progress of medical statisticians to be based primarily on research publication (that is, statistical research papers)' '' as it is for progress of doctors to be based on research publication. We rightly criticize the latter ~ we should be equally critical of the former.

STATISTICS IN MEDICAL JOURNALS DEVELOPMENTS IN THE 1980s 1909

REFERENCES

I . Dunn, H. L. ‘Application of statistical methods in physiology’, Physiological Reviews, 9, 275-398

2. Greenwood, M. ‘What is wrong with the medical curriculum?, Lancet, i, 1269-1270 (1932). 3. Schor, S. and Karten, I. ‘Statistical evaluation of medical journal manuscripts’, Journal ofthe American

4. Johnson, A. L. and Altman, D. G. ‘A survey of reviews of the quality of statistics in the medical

5. Andersen, B. Methodological Errors in Medical Research, Blackwell, Oxford, 1990. 6. Altman, D. G. and Johnson, A. L. ‘A survey of reviews of the quality of statistics in the medical

7. Armitage, P. ‘Biometry and medical statistics’, Biometrics, 41, 823-833 (1985). 8. Felson, D. T., Cupples, L. A. and Meenan, R. F. ‘Misuse of statistical methods in Arthritis and

Rheumatism. 1982 versus 1967-68’, Arthritis and Rheumatism, 27, 1018-1022 (1984). 9. Emerson, J. D. and Colditz, G. A. ‘Use of statistical analysis in the New England Journal of Medicine’,

New England Journal of Medicine, 309, 707-713 (1983). 10. Smith, D. G., Clemens, J., Crede, W., Harvey, M. and Gracely, E. J. ‘Impact of multiple comparisons in

randomized clinical trials’, American Journal of Medicine, 83, 545-550 (1987). 1 I . Getzsche, P. C. ‘Methodology and overt and hidden bias in reports of 196 double-blind trials of non-

steroidal antiinflammatory drugs in rheumatoid arthritis’, Controlled Clinical Trials, 10, 31-56 (1989). 12. Armitage, P. Statistical Methods in Medical Research, Blackwell, Oxford, 1971. 13. Armitage, P. and Berry, G. Statistical Methods in Medical Research, 2nd edn, Blackwell, Oxford, 1987. 14. Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., Mantel, N., McPherson,

K., Peto, J. and Smith, P. G. ‘Design and analysis of randomized clinical trials requiring prolonged observation of each patient. 11. Analysis and examples’, British Journal of Cancer, 35, 1-39 (1977).

15 . Andersen, P. K. ‘Survival analysis 1982-1991: the second decade of the proportional hazards regression model’, Statistics in Medicine, 10, 1931-1941 (1991).

16. Cox, D. R. ‘Regression models and life tables’ (with discussion), Journal ofthe Royal Statistical Society, Series B, 34, 187-220 (1972).

17. Greenland, S. ‘Quantitative methods in the review of epidemiologic literature’, Epidemiologic Reviews,

18. MacMahon, S., Peto, R., Cutler, J., Collins, R., Sorlie, P., Neaton, J., Abbott, R., Godwin, J., Dyer, A. and Stamler, J. ‘Blood pressure, stroke, and coronary heart disease. Part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias’, Lancet,

(1929).

Medical Association, 195, 1123-1 128 (1966).

literature. I. Methods and bibliography (pre-l987)’, in preparation.

literature. 111. Findings’, in preparation.

9, 1-30 (1987).

335, 765-774 (1990). 19. ‘Whither meta-analysis’ (editorial), Lancet, i, 897-898 (1987). 20. L’Abbt, K. A., Detsky, A. S. and O’Rourke, K. ‘Meta-analysis in clinical research’, Annals ofInternal

21. Light, R. J. ‘Accumulating evidence: using meta-analysis to carry out research reviews in pediatrics’,

22. Simon, R. ‘Overviews of randomized clinical trials’, Cancer Treatment Reports, 71, 3-5 (1987). 23. Chalmers, I., Enkin, M. and Keirse, M. J. N. C. (eds) Efjctioe Cure in Pregnancy and Childbirth, Oxford

24. Wulff, H. R. ‘Confidence limits in evaluating controlled therapeutic trials’, Lancet, ii, 969-970 (1973). 25. Rothman, K. J. ‘A show of confidence’, New England Journal of Medicine, 229, 1362-1363 (1978). 26. Gardner, M. J. and Altman, D. G. ‘Confidence intervals rather than P values: estimation rather than

27. Simon, R. ‘Confidence intervals for reporting results of clinical trials’, Annals of Internal Medicine, 105,

28. Rothman, K. J. and Yankauer, A. ‘Confidence intervals vs significance tests: quantitative interpreta-

29. Bulpitt, C. J. ‘Confidence intervals’, Lancet, i, 488 (1987). 30. Altman, D. G. and Dore, C. J. ‘Randomisation and baseline comparisons in clinical trials’, Lancet, 335,

Medicine, 107, 224-233 (1987).

Pediatrics, 78, 1445-1447 (1986).

University Press, Oxford, 1989.

hypothesis testing’, British Medical Journal, 292, 746-750 (1986).

429-435 (1986).

tion’, American Journal of Public Health, 76, 587-588 (1986).

149-153 (1990). 31. Evans, S. J. W., Mills, P. and Dawson, J. ‘The end of the p value?, British Heart Journal, 60, 177-180

(1988).

1910 D. G. ALTMAN

32. Hill, A. B. ‘Memories of the British streptomycin trial in tuberculosis. The first randomized clinical

33. Goodman, S. N. and Royall, R. ‘Evidence and scientific research’, American Journal ofpublic Health,

34. Clayton, D. and Kaldor, J. ‘Empirical Bayes estimates of age-standardized relative risks for use in

35. Simon, R. ‘Statistical tools for subset analysis in clinical trials’, Recent Results in Cancer Research, 111,

36. Pocock, S. J. and Hughes, M. D. ‘Estimation issues in clinical trials and overviews’, Statistics in Medicine, 9, 657-671 (1990).

37. Davis, C. E. and Leffingwell, D. P. ‘Empirical Bayes estimates of subgroup effects in clinical trials’, Controlled Clinical Trials, 11, 37-42 (1990).

38. Pocock S. J., Hughes, M. D. and Lee, R. J. ‘Statistical problems in the reporting of clinical trials. A survey of three medical journals’, New England Journal of Medicine, 317, 426-432 (1987).

39. Yusuf, S., Collins, R. and Peto, R. ‘Why do we need some large, simple randomized trials?, Statistics in Medicine, 3, 409420 (1984).

40. Dillman, R. O., Seagren, S. L., Propert, K. J., Guerra, J., Eaton, W. L., Perry, M. C., Carey, R. W., Frei, E. F. and Green, M. R. ‘A randomized trial of induction chemotherapy plus high-dose radiation versus radiation alone in stage 111 non-small-cell lung cancer’, New England Journal of Medicine, 323,930-935 (1991)

41. Souhami, R. L., Spiro, S. G. and Cullen, M. ‘Chemotherapy and radiation therapy as compared with radiation therapy in stage 111 non-small-cell cancer’, New England Journal oj’ Medicine, 324, 1 136 (1991).

42. Zelen, M. ‘Play-the-winner rule and the controlled clinical trial’, Journal of the American Statistical Association, 64, 131-146 (1969).

43. Simon, R. ‘Adaptive treatment assignment methods and clinical trials’, Biornetrics, 33, 743-749 ( 1977). 44. Zelen, M. ‘A new design for randomized clinical trials’, New England Journal of Medicine, 300,

45. Bartlett, R. H., Roloff, D. W., Cornell, R. G., Andrews, A. F., Dillon, P. W. and Zwischenberger, J. B. ‘Extracorporeal circulation in neonatal circulatory failure: a prospective randomized study’, Pediatrics, 76,479-487 (1985).

46. ORourke, P. P., Crone, R. K., Vacanti, J. P., Ware, J. H., Lillehei, C. W., Parad, R. P. and Epstein, M. F. ‘Extracorporeal membrane oxygenation and conventional medical therapy in neonates with persistent pulmonary hypertension of the newborn: a prospective randomized study’, Pediatrics, 84,

47. Ware, J. H. ‘Investigating therapies of potentially great benefit: ECMO’ (with comments and rejoinder), Statistical Science, 4, 298-340 (1989).

48. Elliott, S. J. ‘Neonatal extracorporeal membrane oxygenation: how not to assess novel technologies’, Lancet, 337, 476-478 ( 1 99 1).

49. Zelen, M. ‘Randomized consent designs for clinical trials: an update’, Statistics in Medicine, 9,645-656 (1990).

50. Ellenberg, S. S. ‘Randomization designs in comparative clinical trials’, New England Journal of Medicine, 310, 1404-1408 (1984).

51. Zelen, M. ‘Strategy and alternate randomized designs in cancer clinical trials’, Cancer Treatment Reports, 66, 1095-1100 (1982).

52. Byar, D. P., Schoenfeld, D. A., Green, S. B., Amato, D. A,, Davis, R., De Gruttola, V., Finkelstein, D. M., Gatsonis, C., Gelber, R. D., Lagakos, S., Lefkopoulou, M., Tsiatis, A. A., Zelen, M., Peto, J., Freedman, L. S., Gail, M., Simon, R., Ellenberg, S. S., Anderson, J. R., Collins, R., Peto, R. and Peto, T. ‘Design considerations for AIDS trials’, New England Journal of Medicine, 323, 1343-1348 (1990).

53. Pocock, S. J . and Simon, R. ‘Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial’, Biornetrics 31, 103-1 15 (1975).

54. Berry, D. A. ‘Comment: ethics and ECMO’, Statistical Science, 4, 306-310, (1990). 55. Green, S. B. and Byar, D. P. ‘Using observational data from registries to compare treatments: the

56. ‘Potentials and limitations of database research’, Statistics in Medicine, 10, 637-670 (1991). 57. Langman, M. J. S. ‘Towards estimation and confidence intervals’, British Medical Journal, 292, 716

trial’, Controlled Clinical Trials, 11, 77-79 (1990).

78, 1568-1574 (1988).

disease mapping’, Biometrics, 43, 67 1-681 (1987).

55-66 ( 1 988).

1242-1245 (1979).

957-963 (1989).

fallacy of omnimetrics’, Statistics in Medicine, 3, 361-370 (1984).

( 1 986).

STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s 191 1

58. Gardner, M. J. and Altman, D. G. ‘Estimating with confidence’, British Medical Journal, 2%,

59. Altman, D. G. ‘Statistics in medical journals’, Statistics in Medicine, 1, 57-71 (1982). 60. Simon, R. and Wittes, R. E. ‘Methodologic guidelines for reporting results of clinical trials’, Cancer

61. Reisch, J. S., Tyson, J. E. and Mize, S. G. ‘Aid to the evaluation of therapeutic studies’, Pediatrics, 84,

62. Grant, A. ‘Reporting controlled trials’, British Journal of Obstetrics and Gynaecology, %, 397-400

63. Epidemiology Work Group of the Interagency Regulatory Liaison Group. ‘Guidelines for the

64. Lichtenstein, M. J., Mulrow, C. D. and Elwood, P. C. ‘Guidelines for reading case-control studies’,

65. Bracken, M. B. ‘Reporting observational studies’, British Journal of Obstetrics and Gynaecology, %,

66. Sheps, S. B. and Schechter, M. T. ‘The assessment of diagnostic tests. A survey of current medical research’, Journal of the American Medical Association, 252, 241 8-2422 (1984).

67. Wald, N. and Cuckle, H. ‘Reporting the assessment of screening and diagnostic tests’, British Journal of Obstetrics and Gynaecology, 96, 389-396 (1989).

68. Sacks, H. R., Berrier, J., Reitman, D., Ancona-Berk, V. A. and Chalmers, T. C. ‘Meta-analyses of randomized controlled trials’, N e w England Journal of Medicine, 316, 450-455 (1987).

69. Altman, D. G., Gore, S. M., Gardner, M. J. and Pocock, S. J. Statistical guidelines for contributors to medical journals’, British Medical Journal, 286, 1489- 1493 (1983).

70. International Committee of Medical Journal Editors. ‘Uniform requirements for papers submitted to biomedical journals’, British Medical Journal, 296, 401-405 (1988) and Annals of Internal Medicine,

71. Bailar, J. C. and Mosteller, F. ‘Guidelines for statistical reporting in articles for medical journals. Amplifications and explanations’, Annals of Internal Medicine, 108, 266-273 (1988).

72. Gardner, M. J., Machin, D. and Campbell, M. J. ‘Use of check lists in assessing the statistical content of medical studies’, in Gardner, M. J. and Altman, D. G. (eds), Statistics with Confidence, British Medical Journal, London, 1989, pp. 101-108.

1210-1211 (1988).

Treatment Reports, 69, 1-3 (1985).

815-827 (1989).

(1989).

documentation of epidemiological studies’, American Journal of Epidemiology, 114, 609-61 3 (198 1).

Journal of Chronic Diseases, 40, 893-903 (1987).

383-388 (1989).

108, 258-265 (1988).

73. Altman, D. G. and Gardner, M. J. ‘Presentation of variability’, Lancet, ii, 639 (1986). 74. Huth, E. J. ‘Uniform requirements for manuscripts: the new, third edition’, Annals oflnternal Medicine,

75. Anderson, J. R., Cain, K. C. and Gelber R. D. ‘Analysis of survival by tumor response’, Journal of Clinical Oncology, 1, 710-719 (1983).

76. Weiss, G. B., Bunce, H. and Hokanson, J. A. ‘Comparing survival of responders and nonresponders after treatment: a potential source of confusion in interpreting cancer clinical trials’, Controlled Clinical Trials, 4, 43-52 (1983).

108, 298-299 (1988).

77. Bertino, J. R. ‘Guidelines for reporting clinical trials’, Journal of Clinical Oncology, 4, 1 (1986). 78. Quinn, M. A. and Campbell, J. J. ‘Tamoxifen therapy in advanced/recurrent endometrial carcinoma’,

79. Propert, K. J. and Anderson, J. R. ‘Assessing the effect of toxicity on prognosis: methods of analysis and

80. Christensen, J. ‘Sikkerhedsintervaller - radaktionens indstilling’, Ugeskrift f i r Laeger, 152,2622 (1990). 81. Ludbrook, J. ‘Words on numbers: statistics in medical journals’, Australia and N e w Zealand Journal of

82. Haynes, R. B., Mulrow, C. D., Huth, E. J., Altman, D. G. and Gardner, M. J. ‘More informative

83. Melton, A. W. ‘Editorial’, Journal of Experimental Psychology, 64, 553-557 (1962). 84. Sterling, T. D. ‘Publication decisions and their possible effects on inferences drawn from tests of

85. Begg, C. B. and Berlin, J. A. ‘Publication bias: a problem in interpreting medical data (with discussion)’,

86. Easterbrook, P. J., Berlin, J. A., Gopalan, R. and Matthews, D. R. ‘Publication bias in clinical research’,

87. Chalmers, I. ‘Biased touching of the editorial tiller’, Archioes of Disease in Childhood, 60, 394 (1985).

Gynecologic Oncology, 32, 1-3 (1989).

interpretation’, Journal of Clinical Oncology, 6, 868-870 (1988).

Surgery, 59, 443-444 (1989).

abstracts revisited’, Annals of Internal Medicine, 113, 69-76 (1990).

significance - or vice versa’, Journal of the American Statistical Association, 54, 30-34 (1959).

Journal of the Royal Statistical Society, Series A , 151, 419-463 (1988).

Lancet, 337, 867-872 (1991).

1912 D. G. ALTMAN

88. Hawkins, D. F. ‘Clinical trials - meta-analysis, confidence limits and ‘intention to treat’ analysis’, Journal of Obstetrics and Gynaecology, 10, 259-260 (1990).

89. Newcombe, R. G. ‘Towards a reduction in publication bias’, British Medical Journal, 295, 656-659 (1987).

90. Siegel, J. P. ‘Editorial review of protocols for clinical trials’, New England Journal ofMedicine, 323, I355 ( 1990).

91. Gardner, M. J. and Bond, J. ‘An exploratory study of statistical assessment of papers published in the British Medical Journal’, Journal OJ the American Medical Association, 263, 1355-1357 (1990).

92. George, S. L. ‘Statistics in medical journals: a survey of current policies and proposals for editors’, Medical and Pediatric Oncology, 13, 109-1 12 (1985).

93. Hughes, W. T., Rivera, G. K., Schell, M. J., Thornton, D. and Lott, L. ‘Successful intermittent chemoprophylaxis for pneumocystis carinii pneumonitis’, New England Journal of Medicine, 316,

94. Kuntze, C. E. E., Ebels, T., Eijgelaar, A. and Homan van der Heide, J. N. ‘Rates of thromboembolism with three different mechanical heart valve prostheses: randomised study’, Lancet, i, 514-517 (1989).

95. Vaira, D., D’Anna, L., Ainley, C., Dowsett, J., Williams, S., Baillie, J., Cairns, S., Croker, J., Salmon, P., Cotton, P., Russell, C. and Hatfield, A. ‘Endoscopic sphincterotomy in lo00 consecutive patients’, Lancet, ii, 431-434 (1989).

1627-1632 (1987).

96. Mossberg, H.-0. ‘40-year follow-up of overweight children’, Lancet, ii, 491-493 (1989). 97. Nio, Y., Imai, S., Shiraishi, T., Tsubono, M., Morimoto, H., Tseng, C.-C. and Tobe, T., ‘Chemosensit-

ivity correlation between the primary tumors and simultaneous metastatic lymph nodes of patients evaluated by DNA synthesis inhibition assay’, Cancer, 65, 1273-1278 (1990).

98. Munley, A. J., Railton, R., Gray, W. M. and Carter, K. B. ‘Exposure of midwives to nitrous oxide in four hospitals’, British Medical Journal, 293, 1063-1064 (1986).

99. Soulillou, J.-P., Cantarovich, D., Le Mauff, B., Giral, M., Robillard, N., Hourmant, M., Hirn, M. and Jacques, Y . ‘Randomized controlled trial of a monoclonal antibody against the interleukin-2 receptor (3383.1) as compared with rabbit antithymocyte globulin for prophylaxis against rejection of renal allografts’, New England Journal qf Medicine, 322, 1 175- 1 182 (1 990).

100. Andrew, E., Eide, H., Fuglerud, P., Hagen, E. K., Kristoffersen, D. T., Lambrechts, M., Waaler, A. and Weibye, M. ‘Publications on clinical trials with X-ray contrast media: differences in quality between journals and decades’, European Journal of Radiology, 10, 92-97 (1990).

101. Liberati, A,, Himel, H. N. and Chalmers, T. C. ‘A quality assessment of randomized control trials of primary treatment of breast cancer’, Journal of Clinical Oncology, 4, 942-951 (1986).

102. Senn, S. ‘The use of baselines in clinical trials of bronchodilators’, Statistics in Medicine, 8, 1339- 1350 ( 1 989).

103. Vaisrub, N. ‘Manuscript review from a statistician’s perspective’, Journal of the American Medical Association, 253, 3145-3147 (1985).

104. Murray, G. D. ‘The task of a statistical referee’, British Journal ofsurgery, 75, 664-667 (1988). 105. Burnand, B., Kernan, W. N. and Feinstein, A. R. ‘Indexes and boundaries for “quantitative signifi-

106. Oldham, P. D. ‘A note on the analysis of repeated measurements of the same subjects’, Journal of

107. Hayes, R. J. ‘Methods for assessing whether change depends upon initial value’, Statistics in Medicine,

108. Nieto-Garcia, F. J. and Edwards, L. A. ‘On the spurious correlation between changes in blood pressure and initial values’, Journal of Clinical Epidemiology, 43, 727-728 (1990).

109. Musso, N. R., Deferrari, G., Pende, A., Vergassola, C., Saffioti, S., Gurreri, G. and Lotti, G. ‘Free and sulfoconjugated catecholamines in normotensive uremic patients: effects of hemodialysis’, Nephron, 51, 344-349 (1989).

110. Boer, P. ‘Misleading statistics: predialysis norepinephrine and its change after hemodialysis’, Nephron, 55, 78 (1990).

111. Musso, N. R., Deferrari, G., Pende, A,, Vergassola, C., Saffioti, S., Gurreri, G. and Lotti, G. ‘Reply’, Nephron, 55, 79-80 (1990).

112. Altman, D. G. ‘Relating change to initial value’, Nephron, 59, 522 (1991). 1 1 3. Altman, D. G. and Bland, J. M. ‘Improving doctors’ understanding of statistics (with discussion)’,

cance” in statistical decisions’, Journal of Clinical Epidemiology, 43, 1273- I284 (1990).

Chronic Diseases, 15, 969-977 (1962).

7, 915-927 (1988).

Journal of the Royal Statistical Society, Series A, 154, 223-267 (1991).

STATISTICS IN MEDICAL JOURNALS: DEVELOPMENTS IN THE 1980s 1913

114. Barrett, J. F. R., Jarvis, G. J., Macdonald, H. N., Buchan, P. C., Tyrrell, S. N. and Lilford, R. J.

115. Esmail, A. and Bland, M. ‘Caesarean section for fetal distress’, Lancet, 336, 819 (1990). 116. Hill, A. B. ‘Principles of medical statistics. I-The aim of the statistical method’, Lancet, i, 41 -43 (1937). 1 17. Medical Research Council. ‘Streptomycin treatment of pulmonary tuberculosis’, British Medical

Journal, ii, 769-782 (1948). 118. Bland, J. M., Altman, D. G. and Royston, J. P. ‘Statisticians in medical schools’, Journal of the Royal

College of Physicians, 24, 85-86 ( 1 990). 119. ‘Statistical review for journals’ (editorial), Lancet, i, 84 (1991). 120. Lock, S. ‘Preface’, in Gore, S. M. and Altman, D. G., Statistics in Practice, British Medical Association,

London, 1982. 121. Bland, J. M. and Altman, D. G. ‘Misleading statistics: the quality of textbooks in medical statistics’,

Intemurional Journal qf Epidemiology, 17, 245-247 (1988). 122. Thompson, A. L. Halfu Century of Medical Research, Volume One: Origins and Policy of the Medical

Research Council ( U K ) , H.M.S.O., London, 1973, p. 28. 123. Anturane Reinfarction Trial Research Group. ‘Sulfinpyrazone in the prevention of sudden death after

myocardial infarction’, New England Journal of Medicine, 302, 250-256 (1980). 124. Smithells, R. W., Sheppard, S., Schorah, C. J., Seller, M. J., Nevin, N. C., Harris, R., Read, A. P. and

Fielding, D. W. ‘Possible prevention of neural-tube defects by periconceptional vitamin supple- mentation’, Lancet, i, 339-340 (1980).

125. Bagenal, F. S., Easton, D. F., Harris, E., Chilvers, C. E. D. and McElwain, T. J. ‘Survival of patients with breast cancer attending Bristol Cancer Help Centre’, Lancet, 336, 606-610 (1990).

‘Inconsistencies in clinical decisions in obstetrics’, Lancet, 336, 549-55 1 ( 1 990).