Literature review March–June 2005

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 2005; 4: 221–224

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/pst.177

Literature Review March–June 2005

Simon Day1,*,y and Meinhard Kieser2

1Medicines and Healthcare products Regulatory Agency, Room 13-205, Market Towers,

1 Nine Elms Lane, London SW8 5NQ, UK2Department of Biometry, Dr Willmar Schwabe Pharmaceuticals, Karlsruhe, Germany

INTRODUCTION

This review covers the following journals received during the

period from the middle of March 2005 to middle of June 2005:

* Applied Statistics, volume 54, part 3.* Biometrical Journal, volume 47, part 2.* Biometrics, volume 61, part 1.* Biometrika, volume 92, part 1.* Biostatistics, volume 6, part 2.* Clinical Trials, volume 2, parts 2, 2s.* Communications in Statistics – Simulation and Computation,

volume 34, parts 1, 2.* Communications in Statistics – Theory and Practice, volume

34, parts 1–5.* Computational Statistics & Data Analysis, volume 49, parts

2–4.* Drug Information Journal, volume 39, part 2.* Journal of Biopharmaceutical Statistics, volume 15, parts 2, 3.* Journal of the American Statistical Association, volume 100,

part 2.* Journal of the Royal Statistical Society, Series A, volume

168, parts 2, 3.* Statistics in Medicine, volume 24, parts 8–13.* Statistical Methods in Medical Research, volume 14, part 2.

SELECTED HIGHLIGHTS FROM THE

LITERATURE

A collection of nine papers was published in Clinical Trials,

these being the proceedings of a workshop organised by the UK

Medical Research Council on cluster randomized trials. There

is an introductory editorial to set the scene:

* Moulton LH. A practical look at cluster-randomized trials

(editorial). Clinical Trials 2005; 2:89–90.

Part 2, 2005, of the Journal of Biopharmaceutical Statistics is a

special issue devoted to non-clinical statistical applications in

the pharmaceutical industry. A wide range of topics is covered,

spanning the whole development process of a drug from the

discovery phase to manufacturing.

* Journal of Biopharmaceutical Statistics, 15:193–373.

Phase I

The paper by Zhou reviews Bayesian decision-making proce-

dures for ‘first-into-man’ phase I dose-escalation studies with a

binary response. Such studies are mostly performed in oncology

where doses identified in preclinical trials are administered to

patients for whom other treatments have failed. The objective is

to find a dose that is high enough to be effective but which

at the same time avoids a risk of toxicity. In a simulation

study, the performance of the Bayesian approach is investigated

under six scenarios (‘very safe’ to ‘very toxic’) to compare the

impact of different choices of the number of dose levels and

cohort sizes.

* Zhou Y. Choosing the number of doses and the cohort size

for phase 1 dose-escalation studies. Drug Information

Journal 2005; 39:125–137.

While many papers have been published that deal with

the design and analysis of phase I trials in cancer, compa-

ratively little attention has been given to phase I studies in

other indications. The exception proves the rules: Kang and

Ahn investigate the statistical properties of a design that is

frequently used to determine the maximum tolerated dose

(MTD) in patients with Alzheimer’s disease. The performance

of phase I dose-finding studies in patients is particularly

suggested in this indication as it has been shown in the

past that the MTD in Alzheimer’s disease patients may be

substantially higher than the MTD determined in normal

volunteers.

* Kang S-H, Ahn C. The expected toxicity rate at the

maximum tolerated dose in bridging studies in Alzheimer’s

disease. Drug Information Journal 2005; 39:149–157.

Copyright # 2005 John Wiley & Sons, Ltd.Received \60\re /teci

yE-mail: [email protected]

*Correspondence to: Simon Day, Medicines and Healthcareproducts Regulatory Agency, Room 13-205, Market Towers, 1Nine Elms Lane, London SW8 5NQ, UK.

Phase II

One problem is to find the best combination of doses of two

agents – Wang and Ivanova take a slightly simpler problem

where fixed doses of one agent are available and the objective is

to find the best dose of the second agent so as to obtain a

specified toxicity profile. The methods used are described as

more efficient than running several studies at each of the

specified doses of the first agent:

* Wang K, Ivanova A. Two-dimensional dose finding in

discrete dose space. Biometrics 2005; 61:217–222.

Multiplicity

Control of error rates in trials with multiple outcomes,

comparisons, analyses, etc. is an important problem. Whilst

the term ‘gatekeeping’ is used in various contexts, multiplicity is

the meaning for Chen et al. They develop methods with better

power properties than others by considering the interrelation-

ships between various different primary and secondary sig-

nificance tests.

* Chen X, Luo X, Capizzi T. The application of enhanced

parallel gatekeeping strategies. Statistics in Medicine 2005;

24:1385–1397.

Spurrier considers the setting where one aims at demonstrat-

ing that any of more than one experimental treatments is/are

better than the best of more than one controls. The approach

can be viewed as an extension of Dunnett’s many-to-one

comparison procedure. Normal and distribution-free testing

methods are derived that maintain the experimentwise error

rate in the strong sense. By inversion of the tests, simultaneous

confidence intervals are obtained.

* Spurrier JD. Multiple comparisons with the best control in a

one-way layout. Communications in Statistics – Theory and

Methods 2005; 34:651–660.

Sample size calculation and recalculation

Have we not adequately covered estimating the variance from

ongoing trials (with a view to reassessing sample size)? Xing and

Ganju add to the literature on this:

* Xing B, Ganju J. A method to estimate the variance of

an endpoint from an on-going blinded trial. Statistics in

Medicine 2005; 24:1807–1814.

The following paper derives a sample size formula for studies

where the accuracy of different diagnostic tests is compared

and where multiple samples come from the same patient. The

approach takes into account the various sources of correlation

that are present in such settings, and simulations show that the

method works well.

* Liu A, Schisterman EF, Mazumdar M, Hu J. Power and

sample size calculation of comparative diagnostic accuracy

studies with multiple correlated test results. Biometrical

Journal 2005; 47:140–150.

Willan and Pinto take a very pragmatic approach and

consider costs of trials, the ability to make pragmatic decisions,

as well as the standard Type I and II error rates:

* Willan AR, Pinto EM. The value of information and

optimal clinical trial design. Statistics in Medicine 2005;

24:1791–1806.

Interim analyses

We start with a pessimistic view – interim analyses for futility.

Timing of interim analyses is often an administrative issue, but

Gould looks at the science and gives helpful recommendations.

Most notably, and helpfully, a good rule of thumb is not to

carry out interim futility analyses when you have less than

about 40% of the data from the trial. Before then, it is hardly

worth it:

* Gould AL. Timing of futility analyses for ‘proof of concept’

trials. Statistics in Medicine 2005; 24:1815–1835.

In the pharmaceutical industry, everybody seems to be

talking about acceleration of drug development. An attractive

concept to achieve this goal is the combination of phase II

and phase III aspects in a study with a two-stage design.

Conceptually, these designs randomize patients to experi-

mental treatments and a control in the first stage, select

promising treatments at the interim analysis based on the

information available then, and randomize the patients to this

subset of treatments in the second stage. Data from both stages

are used for final decision-making. There are big hopes

connected with this appealing approach of integrating the

aspects of learning and confirming in a single trial – reflected by

the fact that three papers on this topic have recently been

published. The proposals differ in the extent of flexibility they

provide, for example with respect to the selection rule

and options for design adaptations. The first two papers deal

with the situation that a ‘provisional’ short-term (surrogate)

endpoint and a long-term clinical endpoint are available.

This enables use of information from interim analyses even

if the duration of patient enrolment is short relative to the

follow-up period.

* Liu Q, Pledger GW. Phase 2 and 3 combination designs to

accelerate drug development. Journal of the American

Statistical Association 2005; 100:493–502.* Todd S, Stallard N. A new clinical trial design combining

phases 2 and 3: sequential designs with treatment selection

and a change of endpoint. Drug Information Journal 2005;

39:109–118.

Copyright # 2005 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2005; 4: 221–224

Literature Review222

* Bischoff W, Miller F. Adaptive two-stage test procedures to

find the best treatment in clinical trials. Biometrika 2005;

92:213–227.

‘Overrunning’ in interim analyses is a particular problem

in cases of fast recruitment and a comparatively long

treatment phase: additional data may accumulate for some

time after the formal stopping rule has been reached. The

following paper proposes a method that deals with this

situation and that is tailored to avoid the awkward case of

conflicting decisions based on the interim data and the final

analysis set:

* Wust K, Kieser M. Monitoring continuous long-term

outcomes in adaptive designs. Communications in Statistics

� Computation and Simulation 2005; 34:321–341.

Chen proposes a strategy for interim analysis that can be

applied in studies with binary endpoints where an exact test is

used for the analysis. Due to the discreteness of the response,

there is generally an ‘unused’ type I error rate. Hence, this ‘free’

a can be spent in an interim analysis without any penalty for the

final test. This idea is illustrated for a single-arm study with the

Pearson–Clopper test and a two-arm study with Fisher’s exact

test.

* Chen C. A note on penalty-free interim analyses of clinical

studies with binary response. Biometrical Journal 2005;

47:194–198.

Data Analysis Issues

There are many situations where we are interested in the

outcome for only a subset of patients in a trial. Although

mortality is usually the most important endpoint, sometimes we

wish to know answers to questions such as ‘of those who

survived, which treatment gives best quality of life?’ (or some

other relevant endpoint). Of course, we lose the benefit of

randomized comparisons as soon as we condition on the

survival endpoint; Hayden et al. describe approaches to this

problem:

* Hayden D, Pauler DK, Schoenfeld D. An estimator for

treatment comparisons among survivors in randomized

trials. Biometrics 2005; 61:305–310.

Selection bias in trials is a subject that has been much ignored

– mostly because people do not think it can happen with

appropriate blinding and randomization; but it probably can

happen. You need to be convinced it can happen (see Berger’s

recent book on Selection bias and covariate imbalances in

randomized clinical trials, published by Wiley in 2005), but if

you are, then the following papers (as well as material in the

book) discuss its consequences and solutions. In the first paper,

Berger challenges the view that the impact of selection bias is

negligible in randomized and blinded trials by quantifying the

extent of baseline imbalance that can result from its presence

when using the randomized blocks procedure. The article is

accompanied by enlightening discussion contributions by

Douglas Altman, Joachim Roehmel and Stephen Senn. The

second paper focuses on procedures for the detection of

selection bias and for an appropriate adjustment of treatment

group comparisons.

* Berger VW. Quantifying the magnitude of baseline covari-

ate imbalances resulting from selection bias in randomized

clinical trials (with discussion). Biometrical Journal 2005;

47:119–139.* Ivanova A, Barrier RC, Berger VW. Adjusting for

observable selection bias in block randomized trials.

Statistics in Medicine 2005; 24:1537–1546.

In the following paper, ‘administrative’ data are taken to be

epidemiological- or demographic-type data (data from hospital

discharge summaries, for example). It is often easy and

inexpensive to get lots of these data so it is obvious to ask

how they might be combined with ‘clinical’ data (here taken

to mean the data from individuals in a study). Austin et al.’s

proposal is to do so via propensity scores.

* Austin PC, Mamdani MM, Stukel TA, Anderson GM,

Tu JV. The use of the propensity score for estimating

treatment effects: administrative versus clinical data.

Statistics in Medicine 2005; 24:1563–1578.

The next paper gives a helpful summary and comparison of

methods for calculating Poisson confidence intervals that have

been proposed in the literature:

* Byrne J, Kabaila P. Comparison of Poisson confidence

intervals. Communications in Statistics – Theory and

Methods 2005; 34:545–556.

Meta-Analysis

Meta-analysts fall into two groups – those analysing individual

patients’ data and those combining summary statistics

across studies. Some meta-analysts, of course, sit happily in

either camp. Pharmaceutical companies often have an

advantage that they have all the original raw data from all

the studies – and for them, the paper by Tudor Smith et al. may

be of some interest. The problem described by Williamson and

Gamble may be one less often associated with meta-analyses

carried out by companies because it describes the case of only

having selected reporting (of the most positive results?) in

published papers.

* Tudur Smith C, Williamson PR, Marson AG. Investigating

heterogeneity in an individual patient data meta-analysis of

time to event outcomes. Statistics in Medicine 2005;

24:1307–1319.

Literature Review 223


* Williamson PR, Gamble C. Identification and impact of

outcome selection bias in meta-analysis. Statistics in

Medicine 2005; 24:1547–1561.

Pharmacovigilance

The assessment of the potential of a drug to delay ventricular

repolarization has gained increased attention through the

recently drafted ICH E14 guideline on this subject. The analysis

of the QT interval (the time required for ventricular depolar-

ization and repolarization) has become a necessary step in

safety evaluation. The QT interval is inversely related to the

heart rate. It is current practice to use one of the available

methods to correct the QT interval for the effects of the heart

rate and to base the assessment on the corrected QTc interval.

The paper by Wei and Chen gives an overview of the commonly

applied correction methods and points out the limitations of

these procedures. As an alternative, a model-based correction

approach is proposed and its application is illustrated with an

example.

* Wei GCG, Chen JYH. Model-based correction to the QT

interval for heart rate for assessing mean QT interval change

due to drug effect. Drug Information Journal 2005; 39:

139–148.

Regulatory issues

We put this next collection of paper, commentaries and

rejoinder under the heading of ‘regulatory’ issues – although

it is as much to do with what constitutes reliable and convincing

evidence as anything else. It relates to the ‘two trials’ paradigm

(‘paradigm’ being the term the authors use). Shun et al. are

approaching the subject from a formal statistical point of view,

rather than a more general, less formal, consideration of the

value of reproducibility of evidence. Considerations of one, or

two, populations are central. A good two-way discussion with

sensible ‘plus’ points and ‘minus’ points ensues: Koch is broadly

supportive, Huque less so.

* Shun Z, Chi E, Durrleman S, Fisher L. Statistical

consideration of the strategy for demonstrating clinical

evidence of effectiveness – one larger vs two smaller pivotal

studies. Statistics in Medicine 2005; 24:1619–1637.* Koch GG, Huque MF. Commentaries on Statistical

consideration of the strategy for demonstrating clinical

evidence of effectiveness – one larger vs two smaller pivotal

studies. Statistics in Medicine 2005; 24:1639–1651.* Shun Z, Chi E, Durrleman S, Fisher L. Rejoinder:

Statistical consideration of the strategy for demonstrating

clinical evidence of effectiveness – one larger vs two smaller

pivotal studies. Statistics in Medicine 2005; 24:1652–1656.

Cost-effectiveness

All multinational studies of costs (with or without effectiveness)

suffer from the problem that costs and cost structures differ so

widely between countries. You can pool data across countries

and risk estimating a cost that is applicable nowhere, or stratify

and lose the benefits of sample size from a bigger study. Pinto

et al. compromise and use shrinkage estimates that have both

benefits of being country-specific and of smaller variance than

those of simple stratified analyses. Perhaps compromising a

little on both facets may produce the ‘best’ (in some very

informal sense) estimate. Some of the algebra is a little heavy

going but worth the trouble if you work in this area. Even if you

do not, there is a long list of interesting and useful references,

some of which may be worth following up.

* Pinto EM, Willan AR, O’Brien BJ. Cost–effectiveness

analysis for multinational clinical trials. Statistics in

Medicine 2005; 24:1965–1982.

Miscellaneous

Pharmacogentics is (apparently) the brave new world for

clinical trialists. Some statisticians/trialists are already involved

– others are sitting on the sidelines. An interesting broad

discussion of some of the issues might help bring some of us up

to speed:

* Kelly PJ, Stallard N, Whittaker JC. Statistical design and

analysis of pharmacogenetic trials. Statistics in Medicine

2005; 24:1495–1508.

Some people advocate the use of Microsoft Excel for

statistical analyses, and courses on this subject are offered.

McCullough and Wilson investigated Excel 2003 with respect to

the accuracy of some statistical distributions, estimation

(summary statistics, ANOVA, linear and nonlinear regression)

and random number generation. They compared the results

with those obtained from previous versions. In all three areas

they found that ‘the performance of Excel ... is still inadequate’.

Potential users would do well to take a look at the paper before

using the statistical procedures in the software.

* McCullough BD, Wilson B. On the accuracy of statistical

procedures in Microsoft Excel 2003. Computational Statis-

tics & Data Analysis 2005; 49:1244–1252.

Finally, presidential addresses usually contain interesting

material that is not always as technically demanding on the

reader as many other papers. Where is our profession going?

Are we losing bits? Are we gaining bits? Geert Molenberghs, the

outgoing president of the International Biometric Society, gives

his views:

* Molenberghs G. Biometry, biometrics, bioinformatics,. . .,bio-X. Biometrics 2005; 61:1–9.


Literature Review224

Documents

Literature review March–June 2005