Upload
dennis-navarro
View
30
Download
0
Embed Size (px)
DESCRIPTION
Case-control studies. Overview of different types of studies Review of general procedures Sampling of controls implications for measures of association implications for bias Logistic regression modeling. Learning Objectives. - PowerPoint PPT Presentation
Citation preview
Case-control studies
• Overview of different types of studies
• Review of general procedures
• Sampling of controls– implications for measures of association– implications for bias
• Logistic regression modeling
Learning Objectives• To understand how the type of control
sampling relates to the measures of association that can be estimated
• To understand the differences between the nested case-control study and the case-cohort design and the advantages and disadvantages of these designs
• To understand the basic procedures for logistic regression modeling
Overview of types of case-control studies
No designated cohort,but should treat source population as cohort
Within a designated cohort
Nested case control Case-cohort
Cases only
Case-crossover
Cases and controls1) Type of sampling
- incidence density- cumulative “epidemic”
2) Source of controls- population-based- hospital - neighborhood- friends- family
Review of General Procedures
Obtaining cases1) Define target cases2) Identify potential cases3) Confirm diagnosis4) *Obtain physician’s consent5) Contact case6) Confirm case’s eligibility7) *Obtain case’s consent8) Obtain exposure data
*need to account for nonresponse
Obtaining controls1) Define target controls2) Define mechanism for identifying controls3) Contact control4) Confirm control’s eligibility5) *Obtain control’s consent6) Obtain exposure data
Pros/Cons of Different Methods
In -p erson - h osp ita l
- c lin ic- h om e
Te lep h on e M ail
C o llec tion o f in fo rm ation
Case cohort study
T0
Assemble cohort, samplesubcohort to beused in all future analyses
First case
Additional cases, compare all cases to the subcohort selectedat T0
Nested case-control Study
T0
Assemble cohort
First case;randomly select controlfrom remaining cohort
Likewise, select controlsagain at each separate time point; note: each of these cases was eligible to be a control for the first case
Crossover designs
Period the subject is exposed
Period the subject is unexposed
• Case-crossover: variation of crossover; case has a pre-disease period which is used as the control period
• Good method for control of confounding• May have limited applications• Assumes that neither exposure nor confounders
are changing over time in a systematic way
Source Population
• Think of all case-control studies as nested within a cohort, even when the cohort is not designated
• Source population is this underlying cohort• Source population describes the cohort giving
rise to the cases; controls are also from this source population
• Source population reflects the disease under study, difficulties in diagnosing disease, routine procedures for recording the disease occurrence, and the frequency of disease
Classic case-control
E xp osed(A )
N on -exp osed(C )
C asesS am p lin g frac tion (f1 )
E xp osed(B )
N on exp osed(D )
C on tro lsS am p lin g frac tion f2
S ou rce p op u la tion
Cases Controls
Exposed A B
Unexposed C D
Odds ratio = AD/BC
If sampling:OR = f1*A*f2*D = AD
f1*C*f2*B BC
Incidence Density Sampling
1) Collect information on each case
2) Collect control at the same time each case is observed; collect control from the underlying source population giving rise to the cases
3) Controls can be cases
Cumulative-based sampling1) Start after the event has happened
2) Ascertain cases
3) Collect controls from the noncases after event (e.g., epidemic)
Measures of Association
• Key concept, sampling fraction, independent of exposure
• Incidence density sampling/nested case-control studies– if exposure odds in controls(B/D)
approximates person-time ratio for source population, odds ratio will approximate rate ratio
– without rare disease assumption
Measures of Association• Case cohort
– if ratio B/D approximates overall prevalence of exposure in source population, odds ratio will approximate risk ratio
– without rare disease assumption• Cumulative “Epidemic” case-control studies
– odds ratio will approximate rate ratio if proportion diseased in each exposure group is low (< 20%) and remains steady during study period
When is the rare disease assumption needed?
• Cumulative-based sampling if want to approximate the relative risk
• Otherwise– nested case-control and incidence density
sampling will give an estimate of rate ratio without rare disease assumption
– case-cohort will give an estimate of risk ratio without rare disease assumption
• If disease is rare, all three measures will be very close
A closer lookRecall:
10
1
0
0
1
1
0
0
1
1
10
1
0
0
1
1
0
1
)1(
)1(
B
B
A
A
BAB
A
AA
AA
OR
T
T
A
A
TAT
A
I
IIR
o
o
Note: in R&G notation A1= A Ao= C B1= B B0= D
If sampling does not lead to bias and the sample can approximate the person-years of distribution, a case-control study is a more efficient design (i.e. for same number of people, more precision). However not as precise as if you use the full cohort.
Pseudo Ratesr
T
B
T
Bf
0
0
1
12
This assumes that the sampling rates are the same for the exposed and unexposed
11
1
1
1
1
1
0
00
1
11
*
rateget r toby multiply
known then isr If
rate-Pseudo
rate-Pseudo
IT
B
B
Ar
B
A
B
A
B
A
oo TATA
rTA
rTA
TT
BA
TT
BA
0
1
1
0
1
1
0
0
0
0
1
1
1
1
0
0
1
1
1*
1*
BA
BA
:unknown isr if
Pseudo Risks
11
1
1
1
1
1
0
00
1
11
*
proportion incidenceget tofby
multiply known then is f If
risk-Pseudo
risk-Pseudo
RN
B
B
Af
B
A
B
A
B
A
0
0
1
1
0
0
1
1
0
0
0
0
1
1
1
1
0
0
1
1
1*
1*
BA
BA
:unknown is f If
NANA
fNA
fNA
NN
BA
NN
BA
fN
B
N
Bf
0
0
1
12
This assumes that the sampling rates are the same for the exposed and unexposed
Summary• Odds Ratio
approximates Rate Ratio– incidence density
sampling– nested case-control
studies– cumulative sampling
is proportion exposed is steady and relatively low
• Odds Ratio approximates Risk Ratio– case-cohort
design– cumulative
sampling if disease is rare
Source of Controlspopulation-based: not necessarily equal probability, control selection probability is proportional to the individual’s person time at riskneighborhood: need to think about referral patterns (e.g., not good for veteran’s hosp); also need to worry about overmatchinghospital: need to be especially concerned that sampling was independent of exposure; try to use a variety of diagnoses for controls.friends: main problem is with overmatching; cedes the control selection to the case rather than to the investigator family:may be worthwhile design to control for certain variable (e.g., spouse control, environment, twin control genetics and environment)
Review of control selection
• Select from source population• Select independent of exposure status• Probability of selection should be proportion
to amount s/he would have contributed to person-time in the denominator
• Not eligible to be a control, if during same time would not have been eligible to be a case
Ille-et-Vilaine study (epidat1.txt)
• conducted between January 1972 and April 1974
• French department of Ille-et-Vilaine (Brittany)
• Men diagnosed in regional hospitals
• Controls sampled from electoral lists in each commune of department
Other methodological points• exposure opportunity
– interest is in the fact of exposure– think about cohort design– make same exclusions to cases as to controls
• comparability of information– may not want comparability if errors are not
independent
• number of different types of control groups– may be appropriate in some situations (e.g., spouse,
siblings), but generally not recommended
• prior disease history
Regression ModelingRegression model- simpler function used to estimate the trueregression function
Benefit of Regression Modeling: overcome the numerical limitations of categorical analysis
Cost of Regression Modeling: assumptions of model; invalidinferences if model is misspecified
E(Y|X = x)
Y = dependent variable, outcome variable, regressandX = set of independent variables, predictors, covariates, regressors
Logistic Regression
e
ee
ee
ee
Y
Y
e
eY
Y
Y
111
11
0, When x
R(x1)1
x1
log
x
x
e
e
xYY
xYY
e
ee
ee
ee
Y
Y
x
x
x
x
x
x
x
x
log0|
1log
1|1
log
111
11
1, When x
R(x1) Logistic risk model, bounded by 0 and 1
Interpreting Coefficients
If X is continuous, B represents the impact of a one-unit change in X to Y, the exponential of B will give you the OR for a one unit change Note: both modeling variables as either ordered categorical variables or continuous variables assumes the one unit change in X is the same irrespective of the level of X
eOR
xYY
xYY
Y
Y
0|)1
log(
1|)1
log(
unexposed if 0 exposed, if 1 X if
x)1
log(
Logistic Modeling
Y
Linear Model
Log ( Y ) (1-Y)
Logistic Model
Disease of odds Log)Y-1
Ylog(
Disease of OddsY-1
Y
Disease ofy Probabilit
YNot bounded by 0 and 1
Assessing Linearity
• Create categories of continuous variable; categories should represent the same amount of units (e.g., 0-39 grams, 40-79 grams, 80-119, etc)
• Plot Beta coefficients; if pattern is approximately linear, keep variable as a continuous variable in model
Creating Indicator VariablesAlso referred to as categorical regressors,dummy variables
Need to pick a reference level
Number of indicator variables created = # categories - 1
The intercept term picks of the effect of the left out category
Note: The precision of the estimate will depend on which category you pick as the referent
Things to keep in mind
Note: For some variables, nonusers/nonconsumersare different than users/consumers, therefore maywant to keep separate zero category
Last category may be too heterogeneous
Balance between homogeneity and parsimony
Changing units
One Unit Increase X unit increase (e.g., X=10)
Point esimate
Lower 95% CL
Upper 95% CL)(*96.1
)(*96.1
bseb
bseb
b
e
e
e
10)*(*96.1)10*(
10)*(*96.1)10*(
10*
bseb
bseb
b
e
e
e
Note: can also rescale, see pg 371 of R&G
Examining confounding and mediation
1) Run a model with exposure only
2) Run a model with exposure and potential confounder (and/or mediator)
3) Examine the changes in the estimate with exposurebetween model 1 and 2
- if different keep in confounder- many use a 10% change in estimate - if estimate is different but B is still associated with outcome -- partial confounder- data cannot distinguish b/w confounder and mediator
Comparing Models
zxy
y
xy
y
1ln
1lnModel B:
Model A:
If model B is no worse than model A, we want touse model B. Always favor parsimony if we don’t have to give up anything
One way of comparing models is using the log likelihood ratio test
Likelihood Ratio Tests
model)(smaller B''for likelihood
model)(larger A''for likelihoodlog2log2 LR
~ Chi Square with df = (total number of parameters in larger model - total number of parameters in smaller model)