31
Detecting Spatial Detecting Spatial Clustering in Matched Clustering in Matched Case-Control Studies Case-Control Studies Andrea Cook, MS Andrea Cook, MS Collaboration with: Collaboration with: Dr. Yi Li Dr. Yi Li November 4, 2004 November 4, 2004

Detecting Spatial Clustering in Matched Case-Control Studies

Embed Size (px)

DESCRIPTION

Detecting Spatial Clustering in Matched Case-Control Studies. Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004. Outline. Motivation Petrochemical exposure in relation to childhood brain and leukemia cancers Cumulative Geographic Residuals Unconditional Conditional - PowerPoint PPT Presentation

Citation preview

Detecting Spatial Clustering in Detecting Spatial Clustering in Matched Case-Control StudiesMatched Case-Control Studies

Andrea Cook, MSAndrea Cook, MS

Collaboration with:Collaboration with:

Dr. Yi LiDr. Yi Li

November 4, 2004November 4, 2004

OutlineOutline1.1. MotivationMotivation

• Petrochemical exposure in relation to childhood Petrochemical exposure in relation to childhood brain and leukemia cancersbrain and leukemia cancers

2.2. Cumulative Geographic ResidualsCumulative Geographic Residuals• UnconditionalUnconditional• ConditionalConditional

3.3. Simulation ResultsSimulation Results• Type I error Type I error • Power CalculationsPower Calculations

4.4. ApplicationApplication• Childhood Leukemia Childhood Leukemia • Childhood Brain CancerChildhood Brain Cancer

5.5. SoftwareSoftware6.6. DiscussionDiscussion

• Limitations Limitations • Future ResearchFuture Research

Taiwan Petrochemical StudyTaiwan Petrochemical Study

Matched Case-Control StudyMatched Case-Control Study• 3 controls per case3 controls per case• Matched on Age and GenderMatched on Age and Gender• Resided in one of 26 of the overall 38 Resided in one of 26 of the overall 38

administrative districts of Kaohsiung administrative districts of Kaohsiung County, TaiwanCounty, Taiwan

• Controls selected using national Controls selected using national identity numbers (not dependent on identity numbers (not dependent on location). location).

Study PopulationStudy Population

Due to dropout approximately 50% 3 to 1 matching, Due to dropout approximately 50% 3 to 1 matching, 40% 2 to 1 matching, and 10% 1 to 1 matching.40% 2 to 1 matching, and 10% 1 to 1 matching.

LeukemiaLeukemia Brain CancerBrain Cancer

CasesCases 121121 111111

ControlsControls 287287 259259

Map of KaohsiungMap of Kaohsiung

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

##

#

#

#

#

#

#

#

#

# #

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

# #

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

# #

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

##

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

##

##

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

##

#

#

##

#

#

##

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

# #

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

##

#

#

#

###

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

###

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

###

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

###

# #

#

#

#

#

##

# #

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

# #

#

#

#

#

#

#

#

##

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

##

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

###

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

$$$

$

Nantze

Jenwu

Linyuan

Tsoying

# Study Participants$ Petro Plants

Cumulative ResidualsCumulative Residuals

Unconditional (Independence)Unconditional (Independence)• Model definition using logistic regressionModel definition using logistic regression• Extension to Cluster DetectionExtension to Cluster Detection

Conditional (Matched Design)Conditional (Matched Design)• Model definition using conditional logistic Model definition using conditional logistic

regressionregression• Extension to Cluster DetectionExtension to Cluster Detection

Logistic ModelLogistic ModelAssume the logistic model where,Assume the logistic model where,

and the link function,and the link function,

Therefore the likelihood score function for isTherefore the likelihood score function for is

with information matrixwith information matrix

ii Y1i

Yiii )p1(p)p|Y(L

. )p(logit)p(g ii iβX

n

1ii )exp(1

)exp(Y)(U

i

ii βX

βXXβ

β

.)exp(1

)exp()( T

n

1i2 ii

i

i XXβX

βXβI

Residual FormulationResidual Formulation

Then define a residual as,Then define a residual as,

where is the solution to .where is the solution to .

Assuming the model is correctly specified would Assuming the model is correctly specified would imply there is no pattern in residuals.imply there is no pattern in residuals.

=> Use Residuals to test for misspecification.=> Use Residuals to test for misspecification.

)ˆexp(1

)ˆexp(Ye ii

i

i

β 0)(U β

Cumulative Residuals for Model Checking; Lin, Wei, Ying 2002

Hypothesis TestHypothesis Test

Hypothesis of interest,Hypothesis of interest,

Geographic Location, (rGeographic Location, (rii, t, tii ) )

Independent Independent

of Outcome, Yof Outcome, Yii|X|Xii

Cumulative Geographic Residual Cumulative Geographic Residual Moving Block Process is PatternlessMoving Block Process is Patternless

Unconditional Cluster DetectionUnconditional Cluster DetectionDefine the Cumulative Geographic Residual Moving Block Process as,Define the Cumulative Geographic Residual Moving Block Process as,

n

1ii2i221i112121loc ext)bx(,xr)bx(I

n

1),bb|x,x(W

Asymptotic DistributionAsymptotic Distribution

However, the asymptotic distribution of is difficult to However, the asymptotic distribution of is difficult to simulate, but it has been shown to be equivalent to the following, simulate, but it has been shown to be equivalent to the following, conditional on the observed data, distribution, conditional on the observed data, distribution,

wherewhere

i

n

1ii2

121

T

i

n

1ii2i221i112121loc

Ge)()|x,x(n

1

Gext)bx(,xr)bx(In

1),bb|x,x(W

)]ˆexp(1[

)ˆexp(ˆˆ

iXβ

iXβββ iXI

data. observed theoft independen )1,0( ~ G,...,G and

xt)bx(,xr)bx(I)|x,x(

iid

n1

n

1i22i221i1121

T

)]ˆexp(1[

)ˆexp(

iXβ

iXββ iX

)b,b|,(W 21loc

Significance TestSignificance TestTesting the NULLTesting the NULL

• Simulate N realizations ofSimulate N realizations of

by repeatedly simulating , while fixing the data at their observed by repeatedly simulating , while fixing the data at their observed values.values.

• Calculate P-valueCalculate P-value

)t,r(|Y:H iiio iX

)b,b|,(W 21loc

)b,b|,(W),...,b,b|,(W 21loc,N21loc,1

)G,...,G( n1

)b,b|x,x(Wsup)b,b(S and )b,b|x,x(Wsup)b,b(S

whereN

)b,b(S)b,b(SI

value-P

2121locx,x

21loc2121locx,x

21loc

N

1j21loc,j21loc

2121

Conditional Logistic ModelConditional Logistic ModelType of Matching: 1 case to MType of Matching: 1 case to Ms s controlscontrols

Data Structure:Data Structure:

Assume that conditional on , an unobserved stratum-specific intercept, Assume that conditional on , an unobserved stratum-specific intercept, and given the logit link, implies,and given the logit link, implies,

The conditional likelihood, conditioning on is,The conditional likelihood, conditioning on is,

.)exp(

)exp()s|Y(E 1M

1j

isis s

is

is

βX

βX

.)exp(

)exp()(L

1 s

is

s

N

1s

1M

1i

Y

1M

1j j

s

is

βX

βXβ

0Y,...,0Y,1Y s)1M(s2s1 s

s

1YY s)1M(s1 s

Score and InformationScore and Information

Denote the conditional likelihood score as,Denote the conditional likelihood score as,

with information matrix,with information matrix,

,)exp(

)exp()(U)(U

1 1

s

sN

1s

N

1s1M

1j

1M

1js

js

jsjs

1sβX

βXXXββ

.

)exp(

)exp()exp(

)exp(

)exp()(I

1

s

ss

s

sN

1s21M

1j

1M

1j

T1M

1j

1M

1j

1M

1j

T

js

jsjsjs

js

jsjs

βX

βXXβXX

βX

βXXXβ jsjs

Conditional ResidualConditional Residual

Then define a residual as,Then define a residual as,

where is the solution to .where is the solution to .

=> Use these correlated Residuals to test for patterns => Use these correlated Residuals to test for patterns based on location.based on location.

1M

1j js

sisis s )ˆexp(

)ˆexp(Ye

Xβ i

β 0)(U β

Conditional Cumulative ResidualConditional Cumulative ResidualDefine the Conditional Cumulative Residual Moving Block Process as,Define the Conditional Cumulative Residual Moving Block Process as,

Which has been shown to be asymptotically equivalent to,Which has been shown to be asymptotically equivalent to,

wherewhere

and that are independent of observed data.and that are independent of observed data.

)1,0(~G,...,Giid

N1 1

1 sN

1s

1M

1iis2is221is11

1

2121loc ext)bx(,xr)bx(IN

1),bb|x,x(W

ss1

21T

N

1s

1M

1iis2is221is11

1

2121loc

GˆUˆIˆ|x,x

ext)bx(,xr)bx(IN

1),bb|x,x(W

1 s

βββ

1 sN

1s

1M

1iis2is221is1121 /xt)bx(,xr)bx(I)|x,x( ββ

Significance TestSignificance TestTesting the NULL Testing the NULL

• Simulate N realizations ofSimulate N realizations of

by repeatedly simulating , while fixing the data at their observed by repeatedly simulating , while fixing the data at their observed values.values.

• Calculate P-valueCalculate P-value

)t,r(|Y:H isissiso iX

)b,b|,(W 21loc

)b,b|,(W),...,b,b|,(W 21loc,N21loc,1 )G,...,G(

1N1

)b,b|x,x(Wsup)b,b(S and )b,b|x,x(Wsup)b,b(S

whereN

)b,b(S)b,b(SI

value-P

2121locx,x

21loc2121locx,x

21loc

N

1j21loc,j21loc

2121

SimulationSimulation Choice of GChoice of Gii or G or Gisis

UnconditionalUnconditionalNormalNormal DiscreteDiscrete

ConditionalConditionalNormalNormal DiscreteDiscrete

1 to 11 to 1

2 to 12 to 1

3 to 13 to 1 Type I errorType I error Power CalculationsPower Calculations

)1,0(N~G i

2/1.p.w1

2/1.p.w1~G i

)1,0(~ NGs

2/1..1

2/1..1~

pw

pwGs

3/1..2/2

3/2..2/1~

pw

pwGs

4/1..3/3

4/3..3/1~

pw

pwGs

Type I errorType I error

UnconditionalUnconditionalGenerate N xGenerate N xii and y and yii from Unif(0,10) from Unif(0,10)

Type I error is the percentage of found Type I error is the percentage of found significant clusters.significant clusters.

ConditionalConditionalGenerate N xGenerate N xisis and y and yisis from Unif(0,10) from Unif(0,10)

Type I error is the percentage of found Type I error is the percentage of found significant clusters.significant clusters.

Type I errorType I error

UnconditionalUnconditional

ConditionalConditional

300 500 1000 300 500 1000Percent of 20% 0.016 0.036 0.054 0.146 0.172 0.168

Cases 30% 0.024 0.044 0.054 0.136 0.154 0.138

Normal DiscreteNumber of Observations

1:1 2:1 3:1 1:1 2:1 3:1Number of 100 0.010 0.080 0.148 0.020 0.074 0.036

Cases 200 0.012 0.088 0.162 0.030 0.084 0.046

Normal DiscreteType of Matching

Power CalculationsPower Calculations

Two Power CalculationsTwo Power Calculations

1313 1414 1515 1616

99 1010 1111 1212

55 66 77 88

11 22 33 44

Power CalculationsPower Calculations

Single HotspotSingle Hotspot

1313 1414 1515 1616

99 1010 1111 1212

55 66 77 88

11 22 33 44

Power CalculationsPower Calculations

Multiple HotspotsMultiple Hotspots

1313 1414 1515 1616

99 1010 1111 1212

55 66 77 88

11 22 33 44

Power CalculationsPower Calculations

UnconditionalUnconditional

ConditionalConditional

1:1 2:1 3:1 1:1 2:1 3:1Single Cluster

Number of 100 0.606 0.766 0.828 0.706 0.758 0.750

Cases 200 0.886 0.964 0.990 0.908 0.950 0.982

Multi ClusterNumber of 100 0.464 0.704 0.774 0.490 0.672 0.704

Cases 200 0.844 0.946 0.974 0.854 0.932 0.948

Type of MatchingNormal Discrete

Spatial Scan Normal DiscreteSingle 0.958 0.964 0.976

Multi 0.852 0.916 0.932

ApplicationApplication

Study: Study:

Kaohsiung, Taiwan Matched Case-Control Kaohsiung, Taiwan Matched Case-Control StudyStudy

Method: Method:

Conditional Cumulative Geographic Conditional Cumulative Geographic Residual Test (Normal and Mixed Residual Test (Normal and Mixed Discrete)Discrete)

ResultsResults

Odds Ratio (p-values)Odds Ratio (p-values)

Marginally Significant Clustering for both outcomes Marginally Significant Clustering for both outcomes without adjusting for smoking history.without adjusting for smoking history.

Unadjusted Adjusted Unadjusted AdjustedDiscrete 2.10 (0.055) 2.19 (0.143) 1.97 (0.058) 2.08 (0.104)

Normal 2.10 (0.050) 2.19 (0.122) 1.97 (0.052) 2.08 (0.104)

Leukemia Brain Cancer

Childhood LeukemiaChildhood Leukemia

165000 170000 175000 180000 185000 190000

24

90

00

02

50

00

00

25

10

00

02

52

00

00

25

30

00

02

54

00

00

X1

X2

Cu

mu

lativ

e R

esi

du

als

Unadjusted

P-Values:Discrete = 0.055 Normal = 0.050

(a)

165000 170000 175000 180000 185000 190000

24

90

00

02

50

00

00

25

10

00

02

52

00

00

25

30

00

02

54

00

00

X1

X2

Adjusted

(b)

P-Values:Discrete = 0.143 Normal = 0.122

CasesControlsPlants

Childhood Brain CancerChildhood Brain Cancer

165000 170000 175000 180000 185000 190000

24

90

00

02

50

00

00

25

10

00

02

52

00

00

25

30

00

02

54

00

00

X1

X2

P-Values:Discrete = 0.052 Normal = 0.058

(a)

Cu

mu

lativ

e R

esi

du

als

Unadjusted

165000 170000 175000 180000 185000 190000

24

90

00

02

50

00

00

25

10

00

02

52

00

00

25

30

00

02

54

00

00

X1

X2

Adjusted

P-Values:Discrete = 0.104 Normal = 0.104

(b)CasesControlsPlants

SoftwareSoftware

R macro to handle both unconditional and R macro to handle both unconditional and conditional dataconditional data

Dataset:Dataset:X and Y coordinates of each participantX and Y coordinates of each participantCase/control variableCase/control variableCovariate matrixCovariate matrixStratum Variable for conditional dataStratum Variable for conditional data

Takes just a few minutes to run!Takes just a few minutes to run!

DiscussionDiscussion

Cumulative Geographic ResidualsCumulative Geographic Residuals• Unconditional and Conditional Methods for Binary Unconditional and Conditional Methods for Binary

OutcomesOutcomes• Can find multiple significant hotspots holding type I Can find multiple significant hotspots holding type I

error at appropriate levels.error at appropriate levels.• Not computer intensive compared to other cluster Not computer intensive compared to other cluster

detection methodsdetection methods

Taiwan StudyTaiwan Study• Found a possible relationship between Childhood Found a possible relationship between Childhood

Leukemia and Petrochemical Exposure, but not with Leukemia and Petrochemical Exposure, but not with the outcome Childhood Brain Cancer.the outcome Childhood Brain Cancer.

DiscussionDiscussion

Future ResearchFuture Research• Failure Time DataFailure Time Data• Recurrent EventsRecurrent Events• Relocation of Study ParticipantsRelocation of Study Participants• SurveillanceSurveillance