4
Tricks and traps in (Jentai clinical trials Methodology James P. Carlos Epidemiology and Oral Disease Prevention Program, National Institute of Dental Research, Bethesda, Maryland, USA In view of the provocative title of this presentation, 1 imagine that some of you who are interested in clinical trials are eagerly waiting to hear about "tricks" tbat can be used in such research. Fm afraid that I shall have to disappoint you; there are far more "traps" than "tricks". The successful clinical trial is one which closely adheres to a fairly sitn- ple protocol with well-known character- istics; creativity and innovation on the part of the investigators are generally to be avoided. Nevertheless, there are a few adaptations of traditional trial method- ology which might be considered, and I will mention some of them, along with some "traps", during the next few mi- nutes. At the outset it should be understood tbat, in my vocabulary, a clinical trial is a human study in which the investigator actively attempts to intervene to alter the cout\se of a disease process. Some refer to such research as experimental epi- demiology. Trials of potential caries-pre- ventive agents or regimens are the obvi- ous examples. Studies in which the inves- tigator metely observes certain characteristics of a group of patients over time are more properly called either de- scriptive or analytical epidemiologic stu- dies, and I will not be talking about those today. In both medical and dental clinical trials the ideal research design, which we try to approach as closely as possible, is tbe randomized, double-blind trial. In this design, the assigntnent of subjects to treatment (or control) groups follows some probabilistic scheme, and neither investigator nor subject is aware of that assignment until the trial is complete. Practical considerations sometimes force departure from a completely "blinded" study, but we must be aware of the dan- gers of doing so. One cannot overempha- size the importance of randomization in clinical studies. It serves as the only pro- tection against the effects of confounding factors, including those which may not be readily measured. Intraclass correlation I would like to spend a few tninutes on several statistical considerations which arise in study design and analysis. The first of these follows frotn the peculiari- ties of dental data, and is known as intra- class correlation. Dental trials differ in several import- ant ways from those eonducted by our medical colleagues. The most obvious dif- ference is that, in our studies, we rarely make a single diagnosis for each patient in the trial. Instead, we tnake multiple diagnostic observations; one for each tooth or, more commonly, for each tooth surface in the mouth. Thus, in a typical caries trial, a total of 128 diagnoses are available for each patieiU, at each exam- ination. The situation is analogous if we are tneasuring periodontal pocket depth or gingivitis. Although it may appear that the volutne of data thus generated provides much more sensitive infor- tnation than sitnply categorizing each subject as "diseased" or "disease-free", this is not entirely true, because the ob- servations may be highly correlated within each subject. This is partly com- pensated for by summing or otherwise manipulating each set of measurements to arrive at a single score for each subject - DMFS, for example. Now, this praetiee has been routine for so long in caries trials that you may wonder why 1 bother to tiiention it. I do so because a study of periodontal attach- ment levels was recently published, in which multiple measurements within the same mouth were treated as independent variables for statistical analysis. As a re- sult of ignoring intraclass correlation, the Presented at symposium on: Problems Related to Clinical Research in Dentistry, February 15. 1984, Stockholm. Sweden, arranged by the Swedish Medical Research Couneil. This paper has been submitted upon the request of the Editor. significance levels reported for the data were far too small, and the conclusions were therefore suspect. Here is a "trap" to be assiduously avoided. Diagnostic inconsistency Any analysis of clinical trial results is based upon the assutnption that identical diagnostic standards were used to obtain each set of tneasuretnents during the course of the study. Although statistical tnethods will allow for random error in measuretnents, systematic error (and, of course, examiner bias) is assumed to be absent. In practice, however, it appears that investigators sotnetitnes give inad- equate attention to ensure that exatn- itiers, espeeially when tnore than one is involved, are applying the satne diagnos- tic standards with the satne degree of sensitivity and, equally important, that these standards remain constant throughout the trial; that there is no "examiner drift". There are no "tricks" to solve this problem. It is simply essential that exatn- iners be carefully calibrated, not only at the beginning of a trial but repeatedly throughout the study, and that any evi- dence of inconsistency be promptly re- solved. Further, it is necessary to report evidence to satisfy the reader that both between-exatniner and within-examiner diagnostic disagreetnent is small, relative to the differences in disease observed among study groups. Failure to demon- strate this necessarily raises doubts about the validity of results of a trial. Yet, it is often ignored. The discriminatory power of a triai A clinical trial is usually designed to per- tnit a statistical test of the null hypothesis that the intervention being studied is without effect on the course of the dis- ease; that is. that there was no real differ- ence in the final disease status of the various groups in the study. If we reject this hypothesis and conclude that a treat-

Tricks and traps in dental clinical trials

Embed Size (px)

Citation preview

Page 1: Tricks and traps in dental clinical trials

Tricks and traps in (Jentai clinicaltrials

MethodologyJames P. CarlosEpidemiology and Oral Disease PreventionProgram, National Institute of Dental Research,Bethesda, Maryland, USA

In view of the provocative title of thispresentation, 1 imagine that some of youwho are interested in clinical trials areeagerly waiting to hear about "tricks"tbat can be used in such research. Fmafraid that I shall have to disappointyou; there are far more "traps" than"tricks". The successful clinical trial isone which closely adheres to a fairly sitn-ple protocol with well-known character-istics; creativity and innovation on thepart of the investigators are generally tobe avoided. Nevertheless, there are a fewadaptations of traditional trial method-ology which might be considered, and Iwill mention some of them, along withsome "traps", during the next few mi-nutes.

At the outset it should be understoodtbat, in my vocabulary, a clinical trial isa human study in which the investigatoractively attempts to intervene to alter thecout\se of a disease process. Some referto such research as experimental epi-demiology. Trials of potential caries-pre-ventive agents or regimens are the obvi-ous examples. Studies in which the inves-tigator metely observes certaincharacteristics of a group of patients overtime are more properly called either de-scriptive or analytical epidemiologic stu-dies, and I will not be talking about those

today.In both medical and dental clinical

trials the ideal research design, which wetry to approach as closely as possible, istbe randomized, double-blind trial. Inthis design, the assigntnent of subjectsto treatment (or control) groups followssome probabilistic scheme, and neitherinvestigator nor subject is aware of thatassignment until the trial is complete.Practical considerations sometimes forcedeparture from a completely "blinded"study, but we must be aware of the dan-gers of doing so. One cannot overempha-size the importance of randomization in

clinical studies. It serves as the only pro-tection against the effects of confoundingfactors, including those which may notbe readily measured.

Intraclass correlation

I would like to spend a few tninutes onseveral statistical considerations whicharise in study design and analysis. Thefirst of these follows frotn the peculiari-ties of dental data, and is known as intra-class correlation.

Dental trials differ in several import-ant ways from those eonducted by ourmedical colleagues. The most obvious dif-ference is that, in our studies, we rarelymake a single diagnosis for each patientin the trial. Instead, we tnake multiplediagnostic observations; one for eachtooth or, more commonly, for each toothsurface in the mouth. Thus, in a typicalcaries trial, a total of 128 diagnoses areavailable for each patieiU, at each exam-ination. The situation is analogous if weare tneasuring periodontal pocket depthor gingivitis. Although it may appearthat the volutne of data thus generatedprovides much more sensitive infor-tnation than sitnply categorizing eachsubject as "diseased" or "disease-free",this is not entirely true, because the ob-servations may be highly correlatedwithin each subject. This is partly com-pensated for by summing or otherwisemanipulating each set of measurementsto arrive at a single score for each subject- DMFS, for example.

Now, this praetiee has been routinefor so long in caries trials that you maywonder why 1 bother to tiiention it. I doso because a study of periodontal attach-ment levels was recently published, inwhich multiple measurements within thesame mouth were treated as independentvariables for statistical analysis. As a re-sult of ignoring intraclass correlation, the

Presented at symposium on: Problems Related to Clinical Research in Dentistry, February 15.1984, Stockholm. Sweden, arranged by the Swedish Medical Research Couneil.This paper has been submitted upon the request of the Editor.

significance levels reported for the datawere far too small, and the conclusionswere therefore suspect. Here is a "trap"to be assiduously avoided.

Diagnostic inconsistency

Any analysis of clinical trial results isbased upon the assutnption that identicaldiagnostic standards were used to obtaineach set of tneasuretnents during thecourse of the study. Although statisticaltnethods will allow for random error inmeasuretnents, systematic error (and, ofcourse, examiner bias) is assumed to beabsent. In practice, however, it appearsthat investigators sotnetitnes give inad-equate attention to ensure that exatn-itiers, espeeially when tnore than one isinvolved, are applying the satne diagnos-tic standards with the satne degree ofsensitivity and, equally important, thatthese standards remain constantthroughout the trial; that there is no"examiner drift".

There are no "tricks" to solve thisproblem. It is simply essential that exatn-iners be carefully calibrated, not only atthe beginning of a trial but repeatedlythroughout the study, and that any evi-dence of inconsistency be promptly re-solved. Further, it is necessary to reportevidence to satisfy the reader that bothbetween-exatniner and within-examinerdiagnostic disagreetnent is small, relativeto the differences in disease observedamong study groups. Failure to demon-strate this necessarily raises doubts aboutthe validity of results of a trial. Yet, it isoften ignored.

The discriminatory power of a triai

A clinical trial is usually designed to per-tnit a statistical test of the null hypothesisthat the intervention being studied iswithout effect on the course of the dis-ease; that is. that there was no real differ-ence in the final disease status of thevarious groups in the study. If we rejectthis hypothesis and conclude that a treat-

Page 2: Tricks and traps in dental clinical trials

80 CARLOS

ment is effective, we wish to know thechances that we are wrong in our con-clusions. This chance or probability iscalled a and is customarily set at 5%.Conversely, we also need to know theprobability that a particular study designwill result in acceptance of the null hy-pothesis (that is, concluding that "thetreatment was ineffeetive") when, in faet,it should have been rejected. This prob-ability oifailing to detect a truly effectivetreatment is called p. Thus, 1 — p is theprobability that the trial will detect atreatment effect if one exists. 1 — p is us-ually ealled the "power" of a study. Thechoice of a and p, together with the ex-pected magnitude of treatment effect, de-termines the nutnber of subjects requiredfor the trial. For any given real diffetencebetween treatment and control groups,the chance of detecting it, the power, in-crea.ses with the .size of the study groups.

These, of course, are fundamental andwell-known principles of study design.

It turns out, however, that in dentalcaries clinical trials, the concept of powerhas recently assumed new importance.This is the result of two developtnents;first, the general trend toward decliningcaries prevalence, and therefore caries in-cidence, among children in industrializedcountries, and, second, ethical consider-ations which dictate that placebo-treatedcontrol groups are often no longer per-missible, but that a new treatment tnustbe tested against some other tteatmentof already established efficacy. Both thesedevelopments tend to result in clinicaltrials in which the difference in DMFincretnents among groups is relativelysmall compared to those seen 10 or 15 yrago, even when the treatment beingtested is truly better. There are tnany re-cent reports of trials in which no treat-ment effect was established statistically,although there was a clear trend favoringthe treatment group. One suspeets thatthese studies lacked sufficient power, thatis, the nutnber of subjects per group wastoo stnall.

Let me give a numerical exatnple toillustrate how easy it is to fall into the"trap" of inadequate power in a cariestrial. Consider a sitnple, two-group,treatment vs. control study. We followconvention and set a = 0.05. Further, wedecide that we can live with a 10%chance of missing a real treatment effect,if one exists. Thus p = O.IO and l - p("the power of the study") = 90%. How

tnany subjects per group are needed toachieve this power? It depends, of course,on the size of the treatment-control dif-ference that we wish to detect.

Let us suppose that we expect thetreatment to achieve a reduction in cariesincidence of 30% cotnpared to the con-trol, a not unreasonable assutnption afew years ago. In that ease we require176 subjeets in each treattnent group (I).This is not very diff̂ erent frotn the size ofmany earies trials reported in the litera-ture.

On the other hand, it is now 1984, andwe are interested in (say) cotnparing anew ffuoride detitifrice against one whichis already known to be effective. A 10%itnprovement would be clinically mean-ingful and that is what we wish to detect.To do so, with the same power, will re-quire 1913 children in each group (1)!That number will need to be increasedto the extent that any attrition of studygroups during the trial is anticipated.Any trial which is begun with a stnallernutnber of subjects is ptobably dootnedto failure ftotn the outset.

1 think this will suffice to make thepoint that inadequate consideration ofpower when designing a clinical study is amajor trap which grows tnore dangerousevery time a new trial is eontemplated.

inappropriate response variabie

As a diversion, and a rest, from statisticalconsiderations, I wish to consider for atnotnent a clinical question. That is theselection of the appropriate responsevariable in a clinical trial. This problemseems far more important in studies con-cerned with some aspects of gingival orperiodontal diseases than with studies ofdental caries.

For example, we have all seen reportsof trials of potential anti-plaque agentsin which the response variable used wasan index of the extent of plaque on thetooth, or the wet or dry weight of plaqueharvested from teeth. Frequently the in-vestigators have concluded, after largeand expensive studies and with appropri-ate statistical tests, that the test agentwas "significantly" effective in reducingplaque. My reaction to the.se reports is,"so what? Did the agent prevent disease?Will it improve gingival health? Did thetreatment alTect the bacterial or bio-chetnical ecology of the plaque?" Wehave no way of knowing, because these

variables were not tneasured. Those ofyou not fatniliar with the dental clinicaltrial literature tnay find it difficult to be-lieve that such largely tneaningless trialshave been carried out. Unfortunately,they have, and they are still being carriedout today, especially by comtncrcialsponsors who are naturally attracted toyet untapped tnarkets for anti-bacterialaids to oral hygiene. Needless to say, stu-dies ititended solely lo detertnine theeffect of an agent on the quantity of den-tal plaque should not be carried out. Per-haps 1 would be less adatnant shouldsotneonc detnotistratc a reduction inplaque of 100%, but, so far, no one has.

inappropriate statistical tests

Let me now comtnent on one other verycomtnon "trap" in dental clinical trials(then 1 will mention a lew possible"tricks").

To perfonn a statistical test of the nullhypothesis in a sitnple, two-group treat-tnent-control study is rather straightfor-ward. Typically, the difference in ineaticaries iticrements observed in the twogroups is tested by a t-test, with a = 0.05.The only question which arises is whethera otie-sided or two-sided test should beused. On that point, 1 quote frotn thesensible, if consetvative, advice of AKMt-TAGC (2);

"Before the data are exatnined, oneshould decide to use a one-sided test onlyif it is quite certain that departures in oneparticular direction (for exatnple, if thetreatment group experienees a lujrhcv ca-ries incretnctU than the controls) will al-ways be ascribed to chance, and thereforeregatded as noti-sigtiificatit, howeverlarge they are. This situation rately arisesin practice, and it will be safe to assutnethat significance tests would altnost al-ways be two-sided".

1 eoneede, however, that in our studies,our prior knowledge of the possible bio-logic effects of a test agent may be some-titnes so extensive that a decision to usea one-sided test can be justified.

In these times, howevei\ it is rare toeome across a simple, two-gtoup trial.Considerations of litne, tnanpower aridmoney commonly give ri.se to trials inwhich there are several groups givetidifferent treatments (or different dosagesof the same treatment), along with a sin-gle control. Very often, the atialysis ofresults of these trials has consisted of

Page 3: Tricks and traps in dental clinical trials

Tricks and traps in denial elinieal trials 81

comparing the mean DMFS inerementof each treatment group with that of thecontrol, using multiple t-tests. Sometimescomparisons are made between treat-ment group means as well. If the valueof "t" exceeds some critical level (around1.96 when a = 0.05) the result is said tobe significant. This is a common practice.It is also wrong!

As soon as multiple t-tests are madeof the same data set, the critical value of"t" rapidly inci-eases. Depending uponthe number of tests made, a value of "t"that the investigator reports as "sig-nificant at the 5% level" may actuallynot even reach significance at the 20%or 30% level of confidence. Both the in-vestigator and the reader are badlymisled.

There are, of course, a number of well-known procedures to permit multiplecomparisons of group means al any de-sired value of a. These include, for exam-ple, DuNNP.i's modification of (he 1-test(3) and the "Bonferonni criterion" (4).What is disturbing is the number of pub-lished reports of dental clinical trialswhich continue lo ignore this pr-oblem.

"Non-reactive" subjects

Now let us lurii to a common pi-oblemfor which a solution has recently beenproposed, a "trap" which possibly maybe avoided by a "tr-ick".

The pr-oblem arises when, during thecourse of a trial, the eonlr-ol group hassuch a low increment of car-ies thai it isvirtually impossible statistically to de-monstrate a Ireatmenl advantage nomatter how eflective the Ir-eatment actu-ally is. This has often happened and hasresulted in clinical trials which haveended with equivocal results. 1 suspectthat, if caries continues to decline, thisproblem will grow incr-easingly serious.

By examining such data sels, one cansee that a substantial number of childrenin the control group had zero or verysmall caries increments during the study.They ate "non-teactots" who contributelittle or no information to the test of thehypothesis, btrt merely add lo the cost ofthe study. A possible solution, the"trick", is pr-e-selection of subjects lo in-

clude in the study only those who arelikely to have the highest caries in-crements. This approach has been rec-ommended by our colleagues at the Uni-versity of Manchester in the UK, whohave shown substantial improvements intrial efficiency by selecting participantswithin a nart-ow age-r-ange (10 12 yr ap-peal's best), and by selecting those withthe highest prior caries experience (5).Previous research has shown that priorcaries experience ean pr-edict about 20%of the variance in new caries incidenceover a 2-yr period. Recently, these inves-tigators have reported further gains inefficiency by eliminating occlusal sur-facesof first molars from the calculation ofprior caries experience, as these surfacesar-e likely to be alr-eady carious or highlycaries-i-esistant by age 10. This approachlooks pt-omising, but needs to be investi-gated in other populations.

Work being done in G6tebot-g maypermit even more effective pr-e-selectionof susceptible subjects. Those investi-gators have suggested that there may begood correlations between the numbersof Streptococcus mutans and lactobacilliin saliva and subsequent caries incidence.Workers in Umefl are extending this ideato include salivat-y flow, bufier capacityand IVequency of sugar intake as possiblepredictor variables. One intent is toidentify subjeets most in need of intensivepreventive therapy, but there is no reasonwhy the same technique, if validated,could not be used to assist in selectingsubjects for clinical trials. Perhaps, in thefuture, this will be known as the "Swed-ish Trick".

Investigation of concomitant variables

At-e there any other "tricks" to improvethe conduct of our clinical trials? Yes,ther-c is one, which I strongly advocate.

That is the collection of data on con-comitant variables to supplement theclinical data on the incidence of disease.If. for example, the trial is testing a fluor-ide compound, then il would be usefuland wise to determine whether the con-centration of fluoride in plaque was in-ct-eased in tr-eated subjects. In a clinicaltrial of an antimicrobial, a report of a"significant" result would be greatlystrengthened by demonstrating a directeffect on the pathogenic mict-oflora. Thecollection and presentation of such paral-lel data would greatly facilitate interpret-ation of a trial in which the statisticaltests of clinical data wet-e of bor-derlinesignificance. In general, I feel that in fu-ture trials we need a mueh greater em-phasis on demonstrating a plausiblewechattism of action of a new agent, inaddition to showing the end r-esult of thataction. We will have far more confidencein our research if we routinely incor-por-ate such procedures.

There is tnuch, much more that couldbe said about improving clinical trials indentislr-y. I have touched upon only arandom set of problems, and superficiallyat that. A much mor-e thor-ough treat-ment of some of these matters will shortlyappear in a special issue of the Journalof Dental Research which will report theproceedings of an international confer-ence on caries clinical tt-ials held last yearin Chicago. We are now planning a simi-lar meeting on periodontal clinical trials.

If ther-e is any moral in all of this, it isprobably that the most successful trialsare those in which as much emphasis isgiven to prior planning and design, as tothe eonduct of the study and the dataanalysis. Most of the "traps" have al-r-eady been identified and can be avoidedby the cautious, well-informed investi-gator. No doubt, however-, thet-e ai-emanv "tricks" still to be learned.

References1. KrNCiMAN A. Adequate coliort sizes for caries clinical trials. Conmuinitv Dent Oral Epidemiol

mi: 6: 30-5.2. ARMrTAGE P. Statistical methoils in medical research. New York: .lohn Wiley and Sons, 1971.3. DtiNNETT CW. A multiple comparison procedure for comparing several treatti-rcnts with a

contr-ol. ./ Am Slat A.s.soc 1955: 50: 1096-121.4. MtLUm RG. Simultaneous .statistical injerenee. New York: McGvaw-Hi\l. 1966: 67-70.5. DowNi:r{ MC, MrTROl'Otjr^oiis CM. Iruprovertienl in selection of study parlicipanls. Proceed-

ings of a Confer-crtcc on Caries Clinical Trials. ./ Deitt Res 1984: in press.

Page 4: Tricks and traps in dental clinical trials