Design Options in Epidemiologic Research

Embed Size (px)

Citation preview

  • 8/12/2019 Design Options in Epidemiologic Research

    1/9

    Print ISSN: 0355-3140 Electronic ISSN: 1795-990X Copyright (c) Scandinavian Journal of Work, Environment & Health

    Downloaded from www.sjweh.fi on March 29, 2014

    Original article

    Scand J Work Environ Health 1982;8 suppl 1:7-14

    Design options in epidemiologic research. An update.by Miettinen O

    This article in PubMed: www.ncbi.nlm.nih.gov/pubmed/6980462

    http://www.sjweh.fi/show_issue.php?issue_id=233http://www.sjweh.fi/show_abstract.php?author_id=5433http://www.ncbi.nlm.nih.gov/pubmed/6980462http://www.ncbi.nlm.nih.gov/pubmed/6980462http://www.sjweh.fi/show_abstract.php?author_id=5433http://www.sjweh.fi/show_issue.php?issue_id=233
  • 8/12/2019 Design Options in Epidemiologic Research

    2/9

    HONOR RY GUEST LE TURE cand j work environ health 8 1982 : suppl 1, 7 4

    Design options n epidemiologic researchn updateby Oll i Miett inen MD PhD 1

    MIETTlNEN O. Design options in epidemiologic research: An update. cand j workenviron health 8 1982 : suppl 1, 7-14.

    I felt embarassed about the prospect ofgiving a lecture to such a learned audience, especially an honorary lecture. Ihad to find something beyond th e ordinary,and I entertained very seriously some topica l problem areas in methodology, circumscribed and somewhat esoteric. Atth e same time, I continued to be very preoccupied with something more fundamental, th e options in epidemiologic studydesign, which is an aspect of my curr en tresearch interest. I had an urge to talkabout this latter topic but felt insecure ofmy mastery of the issues. I f inal ly feltconfident enough to dare attempt anupdate - and indeed a revision - of myprevious teachings in this area. Not onlyis the topic important in it s own right, butits review gains added urgency from thefact that so many of you are familiar withmy past approach in the InternationalAdvanced Course in Epidemiology that th eInstitute of Occupat ional Health in Helsinki has sponsored over the years.

    PreliminariesGiven the meaning of des ign in general,study design may be taken to mean a Departments of Epidemiology and Biostatistics, School of Public Health, Harvard University, Boston, Massachusetts, United States.Reprint requests to: Dr 0 Miettinen, Departments of Epidemiology and Biostatistics, Schoolof Public Health, Harvard University, 677Huntington Avenue, Boston, MA 02115, USA.

    vISIon of the end-product of a study onone hand and a scheme for carrying out astudy on the other.

    In epidemiologic research the concern iswith th e occurrence of events and statesof illness and hea lth in man. The magnitude of any parameter of such occurrencegenerally depends on various particularsof people s constitutions, behaviors, and/orenvironments. Therefore the quantification of any given occurrence parameter is,in general, a matter of relat ing its magnitude to the various determinants on whichit depends. Such relationships, or occur-rence functions thus constitute the generalforma l object of epidemiologic research.

    The function of concern in any givenstudy may be abstract-general divorcedfrom time and place or particularistic.Either way, the direct yield of the study is aparticularistic function, one that is specificfor the population experience that formedth e base of the study, and thereby is thedirect referent of its results. Such anempirical occurrence function - or qualitative information about it - can bethought of as the direct result of an epidemiologic study.

    When th e aforegiven general meaningof study design is applied to this formulation of the direct result of an epidemiologic study, th e broadest aspects ofepidemiologic study design may be said toinclude the stipulations of i the type ofoccurrence un tion empirical to bederived, ii th e type (and size of the population experience that is to form the empir-

    0355-3140/82/050007-08USD3.00

  • 8/12/2019 Design Options in Epidemiologic Research

    3/9

    ccurrence functions

    ical base - and thereby the directreferent - of th e function, and iii th etype of sampling that is to be used n th eascertainment of the occurrence pattern inth e base. Our concern, then, is with th eoptions in each of these aspec ts of design.

    where D represents a set of determinantsDb Dil and f th e functional relationship. By contrast, a causal relationshipbetween the parameter and any givendeterminant depends on modifiers M ofth e effect, and it is expressed conditionallyon confounders C :

    When the health s tate o r outcome at issueis viewed as an all-or-none characteristic,th e occurrence or outcome parameter maybe taken as a rate of either prevalence orincidence as a matter of design opt ionsrather than as simply d if fe rent types ofFragestellung . In the context of a quantitat ive health characteristic of individualsthe equivalent of a prevalence study is th eassessment of parameters of th e distribution of th e characteristic mean, etc amongpeople. The counterpart of incidencestudies in such a case is the study of th edistribution of changes in th e characteristic over a period of time.

    Whatever type of outcome parameter Pis considered, it s relationship to the determinants D considered is viewed in e itherdescriptive or causal terms, and this duality of interpretation has bearing on th estructure of th e function as well. A descriptive function is simply of th e form

    eq 3

    eq 4t DT < t .

    Pt = Dt ,

    xample 1. In the Collaborative PerinatalStudy 3 th e main concern was with potentialteratogenic effec ts of maternal drug use. Theoretically, incidence of mal fo rmat ions ove r th eperiod of organogenesis could have been relatedto drug exposure at that time, ie, a crosssectional incidence function could theoreticallyhave been taken as the object of th e study.The s tudy actua lly addressed the prevalenceof malformations and other anomalies in thepostnatal period in rel ati on to f eta l drug expo su re and other factors , ie, a longitudinalprevalence function.

    It should be noted that even though causalfunctions are theoretically longitudinal,consideration of practicalities may lead tothe pur su it of a cross-sectional empiricalfunction.

    In a cross-sectional function these timereferents are the same:

    ie, th e value of the outcome parameter atany given point in time T = t is relatedto the realization s of th e determinant sat that same time. In a longitudinal function the time referent of the determinantvalue s is previous to that of the parameter:

    xample 2. The Framingham Heart Study 2was main ly concerned with the occurrence ofcoronary heart disease in terms of both descriptive and causal-interpretative functions. Itcould, theoretically, have focused on prevalence, but it concentrated on incidence/risk.When, in that study or in any study, the incidence over a particular span of time 5 -a incidence, say is expressed as a function of agea t the beginning of the risk period and thevalues of other determinants at that age, th eincidence function is totally cross-sectional.When values before that age are taken intoaccount, the function is longitudinal in termsof th e given definit ion.

    eq 1

    eq 2

    P = f {D} ,

    P = D {M} I {C} . t may be worthy of emphasis that eventhe l at te r function is a directly empirical

    one; the causal- inferent ia l judgementcomes to bear on it in the selection of th eset of confounders on which it is conditioned - a question of study design anddata analysis .

    The realizations of both the outcome andthe determinant s tend to be functions oftime age, duration of follow-up, etc . Interms of the interrelationship of theirtime referent s, one may opt fo r either across-sectional or a longitudinal function.

    aseIn a prevalence study the populationexperience that constitutes it s base maybe a cross-section of a population, ie, suchthat each member of th e population isconsidered at one point in time age, timesince first exposure, only. In such astudy the concern is th e health s tatu s atthat time and th e realization s fo r thedeterminant s a t th at time cross-sectionalfunction or at a previous time longitudi-

    8

  • 8/12/2019 Design Options in Epidemiologic Research

    4/9

    nal function). With the subjects distributed over time, a cross-sectional experience can provide for studying eventime i ts el f as a determinant of th e occurrence parameter. An alternative is toconsider th e exper ience of a ohort as itmoves over th e time span under study.x mple 3 In the Collaborative PerinatalStudy the health outcome was studied frombirth to 7 a of age. One option would havebeen to study each member of the cohort ofnewborns only once in that span of age, witha sui table scatter of subjects within the range,ie, to examine only a cross-section an obliqueone) of the cohort. Actually, the cohort wasfollowed, by means of periodic examinations,from birth to 7 a of age, ie, the full cohor texperience was observed.

    An in iden e study cannot be based ona cross-section of a population; th e observation of transitions from health to illness,say) r equi res longi tudina l populat ion experience, as in the movement of a cohortover time. An alternative to a cohort baseis th e experience of a dynamic popul tionover time.x mple 4 While, in the Framingham HeartStudy, the base was taken as a cohort of 948residents of the town, an alternative wouldhave been to follow the dyn mic populationof Framingham - by repeatedly surveying for the determinants under study and maintaining a register of coronary events. The experience of the 948 cohort, now rapidly fading,would have been subsumed under such a dynamic base, potentially studiable in perpetuity.)

    Wha teve r th e dynamics of th e base inth e aforegiven terms, its distribution ac cording to any given determinant and it srespect ive modifiers and confounders inthe function, the design matrix is, in principle at least, fo r the investigator to decideon. In nonexperimental research th e de cisions are implemented by selectivityonly, and thus the main options in thisregard are nonselective and sele tivedistribution or matrix. For a determinantunder study, selectivi ty means pursuit ofgreater variability. This definit ion appliesto modifiers as well, given that modification is actually studied; if it is not, th edistribution of th e modifier may beconstrained to a narrow range. In thecase of a confounder there is no point inmaximizing variability, whereas restrictingrange is a means of con trol .

    x mple 5 In the Collaborative PerinatalStudy, expectant mothers were enrolled, andtheir pregnancies and offspring followed,regardless of what their drug use was in earlypregnancy. An alternative would have beento be selective according to drug use - taking,say, all users of drugs of interest such thattheir use is reasonably common) and only asample of those who did not use any drugs.x mple 6 In the Framingham Heart Study,the screenees were enrolled without any selectivity according to the determinants of interest.Among the alternatives would have been totake people in the extremes of each determinant two-point design ), possibly supplemented by a sample from the middle of thedistr ibution three-point design ). Similarly,within the broad age range of admissibility,age being a potential modifier of major interest,the cohort was totally nonselective. Again, thealternatives would have included the two-pointdesign, etc.

    In situations in which a single determinant is under study, matching by modifier s) and/or confounder s), represents anadded form of selectivity in the formationof th e study base.

    In nonexperimental research, as ingeneral, th e choice of th e design matrixhas to do with study efficiency in the senseof amount of information per subject.)

    The studied occurrence function generally involves but a few of a multitude ofdeterminants of the par amete r at issue.All th e o th er determinants jointly determine the backround level of th e ra te oro th er p ar amet er , say th e intercept of arate function. t is a question of designto choose the preferred backround level ofth e parameter. For example, in the evaluation of preventives factors capable ofneutralizing otherwise sufficient causes), i tis commonly preferred to use a base witha high backround rate fo r the outcome atissue.

    The placement of t he s tudy base in timeinvolves, in nonexperimental research, thechoice between retrospective and prospe -tive options - given that th e researchproblem is scientific abstract-general). it is particularistic, as in th e evaluation ofhealth practices, then the problem i tse lfdetermines th e time and place) fo r theexperience to be studied.x mple 7 The Collaborative Perinatal Studycould not have been based on any cohort experience from birth to 7 a of age) in the p stbecause information about drug exposure inear ly pregnancy could not have been obtained.

  • 8/12/2019 Design Options in Epidemiologic Research

    5/9

    tantamount to reducing th e base of thesample .Outcome-selective sampling is custom

    ari ly thought of in terms of a census orpossibly sample) of th e cases of illnesstogether with a sample of th e non ses1, 4, 5, 8 . Consider a base exper ience aslaid out in panel A of table 1, ie, a basewhich is either a cross-section of a population prevalence study) or a cohort experience incidence study), with a binarydeterminant and outcome. For the baseth e rate ratio contrasting the index category D = 1 to the reference categoryD = 0 is

    Even th e use of a prospec tive cohor t ofnewborns) would not have been a solution pe rse. A partial solution wou ld have been the useof a prospective cohort with th e mothersinterviewed in an appropriate manner in theearly postpartum period. In point of fact, th eprospective placement of th e cohort was ex ploited even further; a set ting was c reated inwhich newborn babies had recorded historiesof drug exposure in early pregnancy. Motherswere enrolled a lready dur ing pregnancy, an ddrug uses were recorded forthwith on entry,with updates later in pregnancy.)x mple 8. For the Framingham Heart Study,information on the determinants of interest wasnot available retrospectively. t was, therefore,made available by means of examinations onthe prospective cohort that formed the base ofthat study. [The same problem could have beensolved by the use of a prospective dynamicpopulation d example 4 .]

    RRI = Cl/Bll/ Co/Bo)= Cl/CO)/ Bl/Bo). eq 5

    eq 7

    eq 6

    As fo r the place in which the s tudy baseis located, options analogous to th e optionsin t ime exist on th e same condition . However, these options do not reduce to asingle d ichotomy analogous to that fortime.

    Evidently, the options in t ime and placehave implications fo r quality of information and efficiency in th e sense of cost persubject. In addition, selection of time and/or place can be used as a means of attaining a desired design matrix and/or levelof backround rate.

    epresent tionThe base including it s size hav ing beendefined, it remains to ascertain what th eempirical occurrence funct ion in it wasretrospective base or will turn out to beprospective base . To this end, one needsto learn about numerators and denominators of rates - how the cases and th e base,respectively, are distributed over thedeterminant, modifiers, and confounders.One way to achieve this information is th euse of a census each subject in the baseis examined as to al l the per tinent facts- determinant s}, modifiers, confoundersand outcome. An alternative is out ome-sele tive sampling, ie, th e use of a casereferent case-control approach. It is tobe noted that sampling according tothe determinant s)/modifiers/confoundersis no t an alternative to th e census approachin the context of abstract objectives; it is10

    The ratio C/Co is estimable from th e caseseries, and, if the illness i s rare, o canbe estimated from the series of noncases1 . Thus, using th e notat ion in panel Bof table 1,

    RR = Cl/co / nl/no .The aforementioned, customary type of

    outcome-selective study in the case of across-sectional or cohort base has analternative which seems no t to have received proper attention: replacement ofthe sample of noncases by a s mple of thebase n terms of the notation in panel Cof table 1, this design provides the estimate

    RR = cl/co / bl/bo .This estimate, in contrast to th e one fromth e design with noncases equation 6 doesno t depend on any rare disease assumption. I ts s tatist ical treatment is outlinedin appendix

    The distinction between the presentedtwo ways of defining th e reference seriesin outcome-selective sampling is a nonissuein th e context of incidence studies with adynamic base; th e noncases are a sampleof the base of candida tes for incidentcase , and equat ion 6 as well as equation7 gives an estimate of th e incidence-density ratio without any rare disease assumption 7 .x mple 9. In th e Collaborative PerinatalStudy the census approach to th e experienceof the cohor t was employed. Al l informationon drug exposure, etc, was secured and proc-

  • 8/12/2019 Design Options in Epidemiologic Research

    6/9

    essed for each baby in the study, and even avery detailed editing, refer ring back to theoriginal data sheets, employed this censusapproach (3). An alternative would have beensimply to file the prenatal records, thenascertain the health outcome on each baby /child, and finally process and analyze the dataon all cases (representing problems frequentenough for meaningful study) and on a sampleof the base cohort of newborns. xample1 Had the Framingham Heart Studybeen carried out in terms of a dynamic baseas outlined in example 4, the register datawould presumably have been processed indetail, while the survey resul ts would ideal lyhave been processed rout ine ly to a minimal,necessary extent only. For example, electrocardiograms would have been filed awaywithout any readings, etc. Any given analysiswould have been based on a case series (census,from the register) together with a sample ofthe (dynamic) base, drawn on the basis of therosters of screenees simultaneously with theappearances of the cases (time-matching).

    Outcome-selective series may, of course,be drawn w ith o r w itho ut matching onmodifiers and/or confounders. [However,match ing on factors that are not part ofthe occurrence function can be counterpro-

    ductive in terms of efficiency in studies ofthis ty pe (6)].

    With or without matching, which meansselectivity in the sampling of potentialreference subjects only, it may be desirablet o employ selectivity for both series index(case) as well as reference noncase orbase series, according to the determinantand/or th e modifiers.

    Consider first th e added selectivity bythe det erminant in an already outcomeselective study. Commonly the interes t isin a rare exposure, so that B is very smallrelative to Bo. In such a case, a two-stagesampling strategy may be attractive (interms of efficiency . The first-stage sampling, nonselective as to the determinant,could be used to identify the exposurestatus (of cases and of reference subjects .In th e second stage, only a sample of thenonexposed would be selected, randomly,from th e nonexposed in the first stagesample. the sampling fractions fo r thenonexposed in the case and noncase seriesare fc and fn respectively, with seconds tage sample sizes of co = fcCo and no

    ble 1. Layout of numbers of subjects of different types in a cross-sectional or cohort base andalso in outcome-selectilfe samples.

    Determinant D)D= D=O Other Total

    Cases Cl Co C CNOr.lcases Nl No N N NBase Bl Bo B B + B

    B. Samples of cases and noncases

    Base experience

    CasesNoncasesTotal

    c Samples of cases and base

    CasesBaseCasesNoncases

    Determinant D)D=1 D= O Other

    Cl o nl no ntl to t

    Determinant D)D = 1 D= O OtherCl o Cbl bo b

    Cl* o c *nl no n

    TotalC cn nt t

    TotalC b bc* c *

    n n11

  • 8/12/2019 Design Options in Epidemiologic Research

    7/9

    f th th e estimate in equation 6 isno enreplaced byeq 8)

    Similarly, if a sample of th e base is usedinstead of a series of noncases and if th esampling fractions of th e nonexposed indexcase) and reference base) subjects in. th efirst-stage sample are f and u respectively , then the estimate in equation 7 isreplaced by

    [ ctlco )/ bt/bo )] fclfb eq 9where bo u o Statistical a spects ofthese two estimators are outlined inappendix 2.

    Added selectivity by determinant in analready outcome-selective study deservesconsideration in situations in which, afterthe initial selection and ascertainment ofexposure status, expensive data acquisitionremains to be done. This s itua tion mayconcern verification or details of diagnosis,or it may deal with modifiers and/or confounders. Also, if exposure is very common so that th e exposed are sampled, th edata acquisition of concern may deal withde tai ls o f th e exposure.

    Analogously with determinant selectivity, selectivity by mo ifi r in an alreadyoutcome selective study is aimed at in creasing the variability of the modifier the final series so as to increase the amountof information about modification persubject in those series. Thus, in the modif ie r domain in which th e base is scarce,al l cases are enrolled in th e first st ag e ofdeterminant selec tive sampl ing , whileelsewhere only a f ract ion of the availablecases are drawn into the index series. Thesize of th e reference series in the differentdomains o f th e modifier in such a studywould general ly be proportional to that ofth e index series matching .x mple 11. Suppose the Framingham Hea.rtStudy was carried out in terms of a dynamICbase and case-base sampling, as outlined inexamples 4 and 10. Suppose further that peoplewith a history of coronary heart disease CHDwere not excluded from the case register norin the periodic surveys of the population.Somewhere along the way one might havewished to examine serum cholesterol level asa determinant of acute coronary events, withhistory of CHD as a modifier of interest. Thecases would presumably have been quite nicelyevenly) distributed between the two categories12

    of the modifier positive and negative history),so that cases would have been enrolled in thespirit of a census without selectivity by history). On the other hand, the base samplewould have had a very lopsided distributionby the modifier in the absence of matching byit. For some other potentia l modifiers, such asage, the case series too might have been formedin a selective fashion.piloguFrom the preceding analysis it is evidentthat the core issue in epidemiologic studydesign is not the choice between cohortand case-referent studies, contrary to aprevalent belief. Indeed, cohort and casereferent studies are not even alternativesto each other. For a cohort experience,which is a type of study base, the a lte rn.atives ar e a dynamic population experience or a population cross-section, whilefor a case-referent approach to the ascertainment of the base experience th e onlyalternative is th e use of a census.

    In th e formation of th e base, epidemiologists still have a lo t to learn f rom laboratory experimenters, especially in th e employment of an efficient design matrix.Even in clinical trials that are immenselyexpensive it is still customary to stipulateonly the r nges of age and other modifiers,with no selectivity within the range.Rarely do laboratory investigators purchase animals from a store simply stipulating a wide range of age or weight andthen accept, within that range, a totallyarbitrary distribution; it would be recognized as obviously inelegant and inefficient.)

    Conversely, experimenters are very committed to th e census approach to th e as certainment of th e experience of any groupof subjects, whether animals or humans,and they might learn from epidemiologiststhe efficient approach of outcome selectivity.

    Even in epidemiology, the use of outcome-selective studies is still rather primitive. The reference series is routinely takenas a series of noncases, even when a sample of th e base would be preferable. Moreover, the efficiency potential of furtherselectivities according to the determinantor modifiers under study seem not to havebeen realized.

    ha s been my purpose to draw attention to the var ious design alternatives that

  • 8/12/2019 Design Options in Epidemiologic Research

    8/9

    are available in epidemiologic research.Mere awareness of them, I believe, willlead to more rat ional choices in study design - with occasionally very major s av ings through enhanced efficiency.

    AcknowledgmentThis work has been suppor ted by grantnumber 5-Pol-CA06373 from the NationalInstitutes of Health.

    References Cornfield J. A method of estimating comparative rates from clinical da ta : Applications to cancer of the lung, breast and cervix. J natl cancer inst 11 1951 1269-1275.

    2. Dawber TR, Meadors GF, Moore FE. Epidemiological approaches to heart disease:The Framingham study. Am j publ health41 1951 279-286.3. Heinonen OP, Slone D, Shapiro S. Birthdefects and drugs in pregnancy. Ed. DavidW. Kaufman 3rd printing). PSG PublishingCompany Inc, Littleton, MA 1977. 510 p.4. Lilienfeld AM. Foundations of epidemiology.Oxford University Press, New York, NY1976. 283 p.5. MacMahon B, Pugh TF. Epidemiology: Principles and methods. Little, Brown and Co,Boston, MA 1970.6. Miettinen OS. Matching and design efficiency in retrospective studies. Am j epidemiol 91 1970 111-118.7. Miettinen OS. Estimability and estimationin case-referent studies. Am j epidemiol 1031976 226-235.8. Prentice RL. Logistic disease incidence models and case-control studies. Biometrika 661979 403-411.

    AppendixAnalysis under case base selectivity

    For the logar ithm of the point est imatorof the rate ratio {RR in equat ion 7 p 10),the variance may be derived in terms ofa first-order Taylor ser ies approximationwith allowance fo r the correlation be-

    For the case-base strategy of sampling,consider th e da ta layout in panel C of table I i p 11), with the following refinementof definitions: the base sample b ringsup cases that were not included in theoriginal case series itself, such added caseswill be included in the final case series,i.e, in the first row of the data layout. Consequently, th e cases appearing in the basesample c an be thought of as a subset ofth e final case series, ie , as redundantcases.

    In significance testing, th e r edundantcases are to be omitted, ie , the final caseseries is to be compared with the noncasesubset of the base sample. Thus, the concern is with a layout of the form in panelB of table 1 p 11). In t ho se terms, thelarge-sample test is based on Gaussian approximation to a hypergeometric modelfo r the distribution of Cl - following Mante l Haenszel 1). The chi-square statistic,one degree of f reedom, i s

    eq CRR, RR = exp [In RR V l n l U I ) ~ z ] ,

    eq DR, RR = RR t x x ,

    A AVln RR = lIC[ + lIcn + 1-2c* /c l Ib l + lIbO).eq BThus, 100 1 - confidence limits fo rRR may be set as

    where X is t he squa re root of the test statistic in equation A.

    tween Cl and b l conditionally on c and b).The result is

    where is the positive square root of the100 1-a) centile of the chi-square distribution with one deg ree of f reedom.

    Alternatively, the limits may be computed by the use of the test-based method 2):

    Example. Suppose the f inal case series, withsome cases found only on the basis of the basesample, included 10 in the index categoryD = 1 of the determinant D under study,and 50 in the reference category D = 0 , ie,that Cl = 10, = 50, and c = 60. Suppose, too,that the sample of the base included 10 withD = 1 and 90 with D = 0, ie, that bl = 10, b =90, and b = 100. Suppose, finally, that the corresponding numbers of cases in the base

    eq A2 = Cl - ctl/t 2/[cntlto/t3].

    13

  • 8/12/2019 Design Options in Epidemiologic Research

    9/9