ONLINE SUPPLEMENT FOR
Causal inference approaches for studying the relationship between screening and cancer outcomes
Duncan C. Thomasa
Appendix 1: Details of the disease model simulation
Let i = 1,…,5,000 index the sibships and j = 1,..,5, the members. For each individual,
two measured covariates Xijv and two unobserved frailties Zijv (v=1,2) were generated as
multivariate normal deviates with correlations between the two components and among
members of the same sibship. The unobserved ages at which each of 100 polyps would
eventually arise were generated by sampling from a Weibull distribution with shape
parameter 4 and relative rate depending on Xij1 and Zij1, specifically with hazard rate
ij(t) = 3 t3 Rij where Rij = exp(0 + 1Xij1 + 2Zij1)
so that ages at polyp creation Tijk(P) are given by
Tijk(P) = [ln(Uijk) / Rij]
1/4 where Uijk ~ Uniform(0,1).
The parameters of the model were adjusted so that most of these values would be
beyond an individual’s lifetime; thus, the number of polyps most individuals would
experience in their lifetimes was considerably less than the 100 simulated.
Polyps were assumed to grow as the square root of time since they first developed1, the
mass (in terms of number of cells) at time t being
Mijk(t) = [1 + ijk(t Tijk(P))]1/2 [1]
where ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), VM] is the growth rate parameter. Although
there is evidence that some polyps can regress2, this possibility was not explicitly
modeled in the simulation.
Each cell in each polyp was considered to be at risk of conversion to a carcinoma through
an n-stage process, with each mutation occurring at rate ijk. Thus, the time from each
cell’s birth to its fully malignant conversion also has a Weibull distribution with shape
parameter n, so the hazard rate for malignant conversion is given by
(where c1=2/3, c2=4/15, c3=16/105,…) and cumulative hazard
(where c1=4/15, c2=8/105, c3=32/945,…). The elapsed time from polyp creation to
malignant conversion for that polyp is generated by solving for T to
obtain
g ijk (t) = 1+ mijkt( )0
t
ò1
2
nijkn t - u( )
n-1du @ cnnijk
n 1+ mijkt( )n+12
/ mijkn
Gijk (T ) = g ijk (t)dt @0
T
ò Cnnijkn 1+ mijkt( )
n+32/ mijk
n
Gijk (T ) =U(0,1)
. [2]
Each undetected fully malignant clone is then assumed to grow exponentially at rate
ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), 32] .
Each tumor is assigned a clinically diagnosable size Sijk ~ LN(0,12), so the age at which it could be clinically diagnosed is given by
Tijk(C) = Tijk
(P) + Tijk(M) + ln(Sijk)/ijk [3]
if not previously detected by screening and excised, and if no other cancers have been
previously diagnosed. (The parameter chosen for the log mean of Sijk , 0 = 10 [Table 1],
corresponds to about 22,000 cells.)
Appendix 2: Details of the screening model simulation The age at the first recommended screening was chosen to be lognormally distributed
with mean
E(lnTij0(S)) = 0 + 2Xij2 + 3Zij2
and logarithmic standard deviation 1. The probability of being compliant (Cij0 = 1) with
that recommendation was given by a logistic function
logit Pr(Cij0=1) = 0 + 2Xij2 + 3Zij2.
The next recommended screening time for each sib was generated as a lognormal deviate
with mean similar to that described above, but with a different intercept (1 in place of
0) and additional covariates, including the square root of the total number of polyps
detected on the last completed screen and indicator variables for whether any of the
individual’s siblings had completed at least one scan by then, whether any polyps had
been found, and whether any of them had had a cancer diagnosis. The probabilities of
compliance ijs = Pr(Cijs = 1) were also expanded to include terms for each of these
additional covariates, as well as a different intercept term 1 in place of 0.
Since each member’s next screening time depends upon the outcome of the entire process
for the other family members, this necessitated processing the entire family concurrently.
At the time of each completed scan, the outcome variables for the entire family (numbers
of already completed scans, positive scans, and members with a clinically diagnosed
cancer) as of that time were computed and used to determine the next screening time for
each family member. The earliest of these was then processed next, continuing in this
manner until all family members reached their pre-specified ages at censoring.
At each screening time Tijs(S) at which an individual is compliant, the size Mijk(Tijs
(S)) of
each polyp that has not previously been excised is computed using Eq.[1] and its
probability of being detected (the event Dijks = 1) is assumed to be a logistic function of
its current linear dimension (the cube root of its mass):
Tijk(M ) =
1
mijk
Cn2n+2mijk
n+1nijkn+3(- lnU)( )
22n+3
Cn2n ijk
2-1
é
ë
êê
ù
û
úú
logit [Pr(Dijks = 1)] = 0 + 1[Mijk(Tijk(S))]1/3
If detected, that polyp is excised and considered no longer at risk of generating a
malignant clone and is not screened again.
The age at which each tumor could be diagnosed is given by Eq.[3], which does not
depend upon the screening process, so the time of an individual’s cancer diagnosis Tij(D) ,
conditional on the screening history is
Tij(D) = mink(Tijk
(P) + Tijk(M) + Sijk/ijk)
the minimum being taking over all so-far undiscovered polyps k (i.e., those with Dijks = 0
for all prior screening times Tijs(S) < Tij
(D)).
Appendix 3: Propensity scores used for simulated and real data
Following the notation of Hernan and Robins3-5, we begin by fitting a logistic model for
the probability that an individual is screened at each age:
logit Pr[Ai(t)=1 | Vi, Li(t)] = 0(t) + Vi1 + Ai(t-)2+ Li (t-) 3
where Ai(t) is an indicator for whether an individual i is screened at age t, Ai(t-) denotes
the history of screening prior to t, Li (t-) the history of polyps prior to t, and Vi is a
vector of baseline covariates. The baseline risk 0(t) is modeled parametrically, here as a
cubic polynomial although other specifications would be possible. The model is fitted by
creating a dataset in which each individual is represented once for every year up to
censoring or cancer diagnosis and then fitted as if these observations were independent.
It is fitted twice, once with Ai(t-), Li (t-) and Vi to produce , and once with only t to
produce . “Stabilized propensity score weights” swi(t) are then computed using these
estimates as
swi(t) =
Pr Ai(u) t
i(u),V
i; â(0)éë
ùû
Pr Ai(u) t
i(u),A
i(u -1),L
i(u -1),V
i; â(1)éë
ùûu=0
t
Õ
The denominator effectively weights each individual cancer case and control inversely by
the probability of their observed screening history up to that time, thereby creating a
pseudo-population in which cancer outcomes are not confounded by the determinants of
screening history. Since these denominators can be highly variable, the numerators serve
to stabilize these weights, leading to more stable estimates and smaller standard errors.
A standard nested case-control design is used to assess cancer risk in relation to screening
history. For each cancer case i at age ti, one (or more) control(s) is selected at random
from the risk set R (ti), the set of subjects at risk and disease free prior to the case’s age at cancer diagnosis, hereafter called the “reference date” (the time at comparison for each
case and matched controls). A standard conditional logistic regression is then used to
â (1)
â (0)
estimate , except that each subject’s contribution is weighted by their stabilized
propensity scores. Thus, the score contribution for the conditional likelihood of cancer
becomes
the sums in the numerator and denominator being taken over the case (j = 0) and matched
control(s) (j ≥ 1). Here, Zij(t) denotes the vector of covariates related to cancer risk,
including those related to the number of completed or positive screens during some
specified window of times prior to the reference date t and is the corresponding vector
of log-RR coefficients.
Appendix 4: Calculation of target ages at first screen and intervals between screens based on fixed covariates, family history, or both
The cumulative 10-year risk of cancer at age t in the general population, ignoring
competing risks, is given by
P(t +10 | t) =1- exp -L t +10 t( )éë ùû
where L(t2 t1) = l (u)dut1
t2
ò is the cumulative hazard based on population age (and sex)
specific incidence rates l (u) . The corresponding personal 10-year risk for an individual
i with risk factors Zi(t) at time t is given by
Pi(t +10 | t) =1- exp -L t +10 t( )éë ùû ri where 0(t) is the cumulative baseline hazard rate for an individual with Zi(t) 0 and ri =
exp[Zi(t)] is the individual’s personal relative risk. Since the population hazard rate is
simply the average over all the individuals’ hazard rates,
l (t) = E l0(t)ri exp(Zi(t ¢) b( ) = l0(t)r (t)
where r (t) is the average of the individual relative risks among survivors at age t. Hence
we can re-express personal risk in terms of the population rates as
Pi(t +10 | t) =1- exp -L t +10 t( ) ri / r (t)éë ùû
To find the age at which personal 10-year risk equals the population average risk at age
50, we must therefore solve the equation
L t +10 t( ) ri / r (t) = L 60 50( )
Since r (t) varies only slowly with age, due to the survival of the fittest effect, the left-
hand side is essentially monotonic in t, so the equation is easily solved numerically.
U(b ) = Zi0 (ti ) -Zij (ti ) e
Zij (ti ¢) b swij (ti )jåeZij (ti ¢) b swij (ti )jå
æ
èçç
ö
ø÷÷i
å
The same technique is used to find the time interval t at which the personal risk
Pi(t + Dt | t) = P(t +10 | t), the population average 10-year risk (or 5-year risk for those
with polyps found on the previous screen), incorporating the number of previous exams
as one of the time-dependent risk factors in the disease model. For simulating disease
outcomes under different screening programs, the 1-year risk of cancer for an individual
with covariate values Zi(t) at time t is simply Pi(t +1| t) =1- exp l (t)ri / r (t)éë ùû.
For the real data application, the baseline rates were computed year-by-year, based on the
fitted disease relative risk and the population age- and sex-specific rates for the Germany
North Rhine / Westphalia registry21.
Appendix 5: Details of the DACHS case-control study data
The DACHS study6,7 is an on-going population-based case-control study from Germany.
In this analysis, 4334 cases of colorectal cancer and 4231 controls recruited between
2003 and 2013 were included, with exquisitely detailed information on screening
histories for colorectal adenomas (polyps), as well as various risk factors (sex, schooling,
ever regular smoking, ever participation in a general health check-up, BMI 5-14 years
prior to the reference date, average METs, regular NSAIDS use, HRT, and statins) and
family history of colorectal cancer in first and second-degree relatives.
The screening history data for the index subjects (study cases and controls) included only
the years of the first colonoscopy and the most recent three, plus the total number of
colonoscopies, so years for any screens beyond 4 were uniformly spaced between the first
and the third-to-last ones. To avoid overlapping or closely adjacent exams, the total
number of exams was capped at 9. Exams with indication coded as “conspicuous fecal
occult blood test (FOBT) result” or “other” were excluded. Missing dates were assigned
at random between age 40 and the reference age (or interpolated between available
dates). In addition to colonoscopies, information on other screening modalities (e.g.,
FOBT) and type of exam (sigmoidoscopy or full colonoscopy) were available but were
not used in this analysis. No information was asked on screening of family members.
Year of birth, diagnosis, and death were available for each first-degree relative and for
grandparents. For aunts and uncles, only the numbers at risk and numbers affected were
available. Dates for these relatives were assigned by random sampling from the age
distribution of cancer and assuming aunts and uncles were born 25 years prior to the
index subject and grandparents 50 years prior. Again, missing years of cancer or birth for
other relatives were assigned at random, based on the index subject’s year of birth.
In addition to the use of causal inference methods, the analyses reported here differ from
those published earlier in a number of details, notably by using the total number of
screening colonoscopies over various intervals prior to the reference date rather than a
simple binary indicator for any colonoscopies over the interval 110 years prior. We also
carried out analyses using binary indicator variables; as the count variables we used here
have a wider range than the binary one, effect sizes (per screen) are correspondingly
smaller than those for the ever/never variable, but the comparisons across methods were
essentially the same (results not shown). As before, we excluded subjects with
inflammatory bowel disease, but did not exclude those under 50, or those with a most
recent exam less than one or more than 10 years previous, as we wished to assess the
effect of the entire screening history. Subjects with missing covariate values were
excluded, leaving 4065 cases and 3025 controls for fitting the disease model. These
subjects contributed 2191 screening events (1499 initial and 693 subsequent) over a total
of 346,940 person-years of observation. Of the 2,169 exams for which we had
information about results (the first and three most recent), 802 yielded polyps. We also
ran analyses replacing missing covariate values with means, with and without including a
missingness indicator variable in the screening, polyp detection, and disease models,
thereby increasing the sample size to 4,301 cases and 4,215 controls with 3,181 screens
in 419,771 person-years and 1,108 polyps discovered. The results shown in Figures 3, 5
and 6 and Tables 4 and 5 differed very little, the only important difference being
somewhat larger estimates of the observed screening effects, as shown in Supplementary
Figure 1. The comparison between analysis with and without propensity score weighting
remained the same, however.
Because the case-control study over-represents cases, fitting the propensity score model
and predicting the outcomes of counterfactual screening programs requires that cases and
controls be weighed appropriately. As described by Vanderweele and Vansteelandt8, the
appropriate weights are ratios of the population proportions of cases and subjects at risk
at each age to the corresponding numbers of cases and controls in the sample. Thus, for
age-matched nested case-control studies, cases are weighted by (t) / p(t) and controls by
[1 (t)] [1 p(t)], where p(t) denotes the proportion of cases at age t out of all cases and
controls at that age. Use of sampling weights slightly reduced the difference between
unadjusted and IPSW adjusted estimates of the relative risk for screening, but had
relatively little effect on the comparison of cancer outcomes under alternative screening
programs. As expected, there are many fewer predicted cancers after using the sampling
weights, reflecting the lower weight given to the cases, who tend to be at higher risk than
controls, but again the ranking of the predicted cancer outcomes under alternative
screening programs remained unchanged. Tables 4 and 5 and Figures 36 are all based
on the weighted estimates.
Appendix 6: Estimation of counter-factual comparisons from the real data
Outcomes under counter-factual screening programs were computed by simulation using
the fitted polyp detection and disease diagnosis models. At each age, a random choice of
whether cancer occurs or not was made with the corresponding probability and if not, the
process advanced to the next age. Any simulated screening and polyp history events after
cancer incidence are then discarded. Competing risks were included using population
death rates for all Germany from the vital statistics for 2013, subtracting colorectal cancer
incidence and assuming no covariate effects.
For the analysis of the marginal statistics provided in Table 4, a random effects model
was used to estimate the average causal effect of screening. Letting Yij denote the
simulated outcome for the jth replicate of subject i, we assumed Yij ~ N(mi, si2 ) and
mi ~ N(m,s2 ). Maximum likelihood estimates of the parameter of interest m and its
asymptotic variance are easily found by iterating between estimating m and 2 as
described by Stram9. For the binary outcome of cancer, the binomial variance estimator
si2 = Yi+ (1 Yi+) / R was used (where Yi+ denotes the total number of simulated cancer
outcomes out of R = 1000 replicates); for the other outcome variables (number of screens,
number of polyps, age at cancer), the empirical variance across replicates for each
individual was used for si2.
Supplementary Table 1: Simulated parameter values
Parameter family Value Interpretation
(model for log rate of polyp
development)
-16.5 intercept
3.0 Weibull shape
0.4 regression on X1
1.0 regression of Z1
model for log time to next
screen
3.9 intercept for age at first screen
2.3 intercept for subsequent intervals
0.1 SD of log age at first screen
0.1 SD of log interval time
-0.1 regression on X2
-0.1 regression of Z2
-0.1 regression on N previous polyps
-0.02 regression on sib’s N screens
-0.04 regression on sibs’ positive screens
-0.06 regression on sibs with cancer
model for logit probability
of compliance
-0.8 intercept for first screen
-1.0 intercept for subsequent screens
0.5 regression on X2
0.5 regression of Z2
1.0 regression on N previous polyps
0.02 regression on sib’s N screens
0.04 regression on sibs’ positive screens
0.06 regression on sibs with cancer
model for polyp growth rate
6.5 intercept
0.1 regression on X1
0.15 regression of Z1
0.1 SD of log growth rate
model for log cancer growth
rate
1.5 intercept
0.1 regression on X1
0.15 regression of Z1
0.1 SD of log growth rate
model for logit probability
of polyp detection
-13 intercept
3.0 regression on cube root of tumor mass
model for log malignant
conversion rate
(n = 3 mutations)
-6.875 intercept
0.1 regression on X1
0.15 regression of Z1
0.1 SD of log mutation rate
model for clinically
detectable tumor size
10. log mean number of cells
0.5 SD log number of cells
Supplementary Table 2: Expected number of clinically diagnosed or screen detected cancers per 100,000 under various counterfactual policies (analysis window 1-10 years prior to reference date, simulated data)
Outcome of observed
history
Outcome of untargeted screening program
Clinically diagnosed
cancers
Screen-detected cancers
No cancer Cancer No cancer Cancer
No cancer 92,832 916 99,860 112
Cancer 3,544 2,708 28 0
Outcome of observed
history Outcome of targeted screening program
No cancer Cancer No cancer Cancer
No cancer 92,768 980 99,852 120
Cancer 3,844 2,408 28 0
Outcome of untargeted
screening program Outcome of targeted screening program
No cancer Cancer No cancer Cancer
No cancer 94,944 1,432 99,768 120
Cancer 1,668 1,956 112 0
Supplementary Table 3: Summary of log relative risk estimates from the fitted models (DACHS data)
Risk factor
Screening
propensity
Polyp detection
Disease risk
Without PS weights
With PS weights
lnRR (S.E.) lnRR (S.E.) lnRR (S.E.) lnRR (S.E.)
sex 0.487 (0.062) 0.541 (0.140) -0.309 (0.066) -0.306 (0.067)
schooling 0.082 (0.026) -0.240 (0.034) -0.234 (0.034)
ever regular smoker 0.072 (0.033) 0.168 (0.069) 0.220 (0.037) 0.198 (0.038)
ever health check-up 0.268 (0.083) 0.266 (0.196) -0.483 (0.080) -0.479 (0.081)
Body mass index (BMI) 5-14
yr earlier (x 10) 0.018 (0.012) 0.558 (0.067) 0.054 (0.007)
Physical activity (Metabolic
Equivalents on Task (METs)
(x 1000)
-0.507 (0.186) -0.702 (0.226) -0.001 (0.000)
alcohol (x 1000) 0.149 (0.108) 0.269 (0.143) 0.003 (0.001)
nonsteroidal anti-
inflammatory drugs
(NSAIDS)
-0.247 (0.043) -0.335 (0.058) -0.345 (0.059)
hormone replacement
treatment (HRT) 0.503 (0.071) 0.210 (0.168) -0.615 (0.083) -0.605 (0.084)
statins 0.054 (0.047) -0.157 (0.068) -0.134 (0.069)
family history 0.374 (0.042) 0.220 (0.089) 0.405 (0.065) -0.855 (0.061)
total screens 0.425 (0.050) -0.132 (0.075)
first screen positive 0.021 -0.110 0.736 (0.247)
last screen positive 0.512 (0.117) 0.639 (0.251)
number of other positive
screens -0.512 (0.137) 0.309 (0.240)
interval screens (1-10 years
prior to reference date) -0.697 (0.054) -0.855 (0.061)
Time since last exam (TSLE) 0.130 (0.054)
TSLE2 -0.023 (0.008)
Supplementary Figure 1: Distribution of stabilized propensity scores by screening history subgroups (DACHS data); the inverse of these scores is used as the weights. The distributions have mean ± standard deviation as follows: no screening, 0.85 ± 0.10; always negative, 0.93 ± 0.12; ever positive, 1.17 ± 0.37.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.21
0.24
0.28
0.32
0.37
0.42
0.49
0.56
0.65
0.75
0.87
1.00
1.15
1.33
1.54
1.77
2.04
2.36
2.72
3.14
3.62
4.17
4.81
PropensityScore
Noscreening
Nega ve
Posi ve
Supplementary Figure 2: Log RR estimates per screen for the effect of the
colonoscopies on colorectal cancer risk over various windows of time prior to the reference date (DACHS data); IPSW = inverse propensity score weighting. A: Including subjects with missing covariate values replaced by means; B: Same, plus including a missingness indicator variable. These plots can be compared with those in Figure 3 of the main text.
Supplementary references
1. Dowty JG, Byrnes GB, Gertig DM. The time-evolution of DCIS size
distributions with applications to breast cancer growth and progression. Math Med Biol 2014; 31: 353-64.
2. Pickhardt PJ, Kim DH, Pooler BD, et al. Assessment of volumetric growth rates of small colorectal polyps with CT colonography: a longitudinal study of natural history. Lancet Oncol 2013; 14: 711-20.
3. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550-60.
4. Hernan MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 2008; 19: 766-79.
5. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000; 11: 561-70.
6. Brenner H, Chang-Claude J, Jansen L, et al. Reduced risk of colorectal cancer up to 10 years after screening, surveillance, or diagnostic colonoscopy. Gastroenterology 2014; 146: 709-17.
7. Brenner H, Chang-Claude J, Jansen L, et al. Colorectal cancers occurring after colonoscopy with polyp detection: sites of polyps and sites of cancers. Int J Cancer 2013; 133: 1672-9.
8. Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol 2010; 172: 1339-48.
9. Stram DO. Meta-analysis of published data using a linear mixed-effects model. Biometrics 1996; 52: 536-44.
Appendix 7: C code for simulation and analysis of simulated data
What follows is the computer code that generated the simulation and analysis results
provided in Table 1, Figure 2, and Supplementary Table 2. The parameters provided in
Supplementary Table 1 are entered into the program in the constants paragraphs at the
beginning of the program. The “d” arrays which follow each refer
random deviations around these values that were used when multiple replicates were
run to investigate the sensitivity of results to changes in these parameters. This code
was adapted to generate the results from the DACHS study data, replacing the
simulation routines by procedures to read in the real data and perform various
imputations of missing values as described in Appendix 4, along with some further
modifications to replace the unobserved disease process (which is known in the
simulations but not observable in real data) by randomly generated data based on the
fitted disease incidence and polyp detection models as explained in Appendix 6. The
cases and controls for the real data were reweighted to account for their different
sampling probabilities, as explained in Appendix 5. (The real data analysis program is
tailor-made for that dataset, but can be provided to interested users by request to the
author, as some guidance will doubtless be needed to adapt it to other datasets.)
The program was compiled using Microsoft Visual Studio C++ version 6.0 and run on a
MacBook Pro under the VMware Fusion emulator of the Windows XP operating
system. The code should require very little modification to run under other operating
systems and compilers, however.
The core simulation routines are contained in Simulate() and the analysis routines in
Analyze(), both called from main() within a loop over multiple replicates. Both
routines use GetScreeningCovariates() and GetDiseaseCovariates() to get the
relevant covariate values at each point in time for each subject. The routine
NextScreen() is used to simulate the intervals between screens. A similar routine
ProgramNextScreen() is called by Analyze() as part of the counterfactual
comparisons, which in turn relies on fitted logistic regression models to determine the
TargetAgeAtFirstScreen() and TimeToAverageRisk(). These are called from
PredictCancer() for each of the counterfactual screening programs and the results are
tabulated in Counterfactuals(). Analyze() begins by drawing a nested case-control
sample in CaseControlSampling(), then for each “analysis window” calls
EstimatePropensityScores() and calls ConditionalLogisticRegression() to fit
the odds ratio models, with and without using the inverse probability weights. These
estimates are what are used by Counterfactuals() to compute the various summary
statistics (clinically diagnoses and screen detected cancers, false negative and false
positive screens, number needed to screen to prevent one cancer, etc.) under each
counterfactual screening program. That routine also tabulates the pairwise comparisons
of outcomes under different screening programs shown in Supplementary Table 2.
#include
#include
#include
#include
#include
#include
#include "gamma.h"
const int F = 5000, // number of families
M = 5, // number of sibs per family
P = 100, // maximum number of polyps in unobserved history
S = 40, // maximum number of screening times
MaxNN = 1350000, // maximum possible size of propensity score dataset
{{0,0,0,0, 0,0,0,0, 0,0,0,0},
{0,0,0,0, 0,0,0,0, 0,0,0,0},
{1,1,0,0, 1,1,0,0, 1,1,0,0},
{1,0,1,0, 1,0,1,0, 1,0,0,0},
{1,1,1,1, 1,1,0,0, 1,1,0,0}},
{{0,0,0,0, 0,0,0,0, 0,0,0,0},
{0,0,0,0, 0,0,0,0, 0,0,0,0},
{1,0,0,0, 1,0,0,0, 0,0,0,0},
{1,0,0,0, 1,0,0,0, 0,0,0,0},
{1,0,0,0, 1,0,0,0, 0,0,0,0}},
{{0,0,0,0, 0,0,0,0, 0,0,0,0},
{0,0,0,0, 0,0,0,0, 0,0,0,0},
{1,1,0,0, 1,0,0,0, 0,0,0,0},
{1,0,0,0, 1,0,0,0, 0,0,0,0},
{1,1,0,0, 1,1,0,0, 0,0,0,0}}},
NumRepl = 1;
const double SDFshared = 1,
SDFcorr = 1,
SDFindep = 1,
SDXshared = 1,
SDXcorr = 1,
SDXindep = 1,
lambda0[4] = // model for log rate of polyp development
{ -16.5, // intercept
3, // Weibull shape
0.4, // regression on TotalXforPolyps
1}, // regression on TotalFrailtyForPolyps
dlambda[4] = {0.02, 0.00, 0.005, 0.005},
sigma0[11] = // model for ln time to next screen
{ 3.912, // intercept for first screen
2.3, // intercept for subsequent screens
0.1, // LSD for first screen
0.1, // LSD for subsequent screens,
-0.1, // regression on TotalXforScreening
-0.1, // regression on TotalfrailtyForScreening
-0.1, // regression on number of previous polyps
-0.02, // regression on sib screened
-0.04, // regression on sib with polyps
-0.06, // regression on sib with cancer
+0.1}, // patch so very old people don't get screened too often
dsigma[11] = {0.02, 0.02, 0.002, 0.002, 0.002, 0.002, 0.002, 0.0004, 0.0012, 0.002, 0.02},
pi0[10] = // model for logit Pr(compliance with next recommended screen)
{ -0.8, // intercept for first screen
-1, // intercept for subsequent screens
0, // (not used)
0, // (not used)
0.5, // regression on TotalXforScreening
0.5, // regression on TotalfrailtyForScreening
1.0, // regression on number of previous polyps
0.02, // regression on sib screened
0.04, // regression on sib with polyps
0.06}, // regression on sib with cancer
dpi[10] = { 0.02, 0.02, 0, 0, 0.01, 0.01, 0.02, 0.0004, 0.0012, 0.002},
mu0[4] = // model for log polyp growth rate
{ 6.5, // intercept
0.1, // regression on TotalXforPolyps
0.15, // regression on TotalFrailtyForPolyps
0.1}, // SD of log polyp growth rates
dmu[4] = {0.02, 0.001, 0.001, 0.002},
nu0[4] = // model for log cancer growth rate
{ 1.5, // intercept
0.1, // regression on TotalXforPolyps
0.15, // regression on TotalFrailtyForPolyps
0.1}, // SD log cancer growth rate
dnu[4] = {0.05, 0.001, 0.001, 0.002},
psi0[2] = // model for logit probability of polyp detection
{ -13, // intercept
3.0}, // regression on polyp size
dpsi[2] = {0.2, 0.1},
omega0[4] = // polyp to cancer mutation rate (per polyp cell per year)
{ -6.875, // intercept
0.1, // regression on TotalXforPolyps
0.15, // regression on TotalFrailtyForPolyps
0.1}, // SD of log mutation rate
domega[4] = {0.05, 0.01, 0.01, 0.02},
lnAvgSizeAtCaDx0= 10, // ln average size of polyp at cancer diagnosis
dlnAvgSizeAtCaDx= 0.01,
SDlnSize0 = 0.5, // SD ln size of polyp at cancer diagnosis
dSDlnSize = 0.01,
FNwindow = 10; // years following screen used to defing a "false negative"
double lambda[4],sigma[11],pi[10],mu[4],nu[4],psi[2],omega[4],lnAvgSizeAtCaDx,SDlnSize,coeff[4][5];
int a, // index for age = 1,…,100
f, // index for families = 1,…,F
m, // index for sibs within families = 1,…,M
p, // index for polyps for each case = 1,…,P
s, // index for screening times = 1,…,S
v,v1,v2, // index for covariates (v1,v2 for pairs of covariates in information matrix)
repl; // index for replicates = 1,…,NumRepl
int Ndetected[F][M][S];
double SharedFrailty[F][V],CorrelatedFrailty[F][M],IndependentFrailty[F][M][V],TotalFrailtyForPolyps[F][M],TotalFrailtyForScreening[F][M];
double SharedX[F][V],CorrelatedX[F][M],IndependentX[F][M][V],TotalXforPolyps[F][M],TotalXforScreening[F][M];
double AgeOfPolyp[F][M][P],PolypGrowthRate[F][M][P],AgeOfMutation[F][M][P];
double AgeAtScreen[F][M][S];
double AgeAtCancer[F][M],AgeAtCancerWithoutScreening[F][M],AgeOfTumor[F][M][P],ProgramAgeAtCancer[F][M];
double AgeAtCensoring[F][M];
double CancerGrowthRate[F][M][P];
int ScreenCompliant[F][M][S];
int NumScreens[F][M],NumScreensCompleted[F][M],NumUncensoredScreens[F][M],Cancer[F][M];
int ProgramNumScreens[F][M];
int PolypDetected[F][M][P],ProgramPolypDetected[F][M][P],ScreenPositive[F][M][S];
double AgePolypDetected[F][M][P],ProgramAgePolypDetected[F][M][P];
double MeanAgeOfPolyps,VarAgeOfPolyps,MeanUncensoredNPolyps,
MeanTimeToMutation,VarTimeToMutation,MeanUncensoredNMutations,TotalNumMutations,
MeanTimeToCancer,VarTimeToCancer,
NumScreened[S],NPolypsScreened[S],MeanAgeAtScreen[S],MeanNCompliant[S];
int NumPolypsByAge[100],NumMutationsByAge[100];
int IncidentCancers[100],NumUncensored[100],UncensoredCancers[100];
int iter,output,rep,pp;
double MeanGrowthRate[S],MeanPdetectPolyp[S],MeanPolypSizeAtScreening[S],MeanTimeFromPolypToDetection[S],MeanPolypSizeDetected[S],TotalPolypsDetected[S];
int pair,Npair,CaseFamily[MaxPair],CaseMember[MaxPair],ControlFamily[MaxPair],ControlMember[MaxPair];
double PairAge[MaxPair],DiseaseCovar[MaxPair][2][MaxDiseaseCovar],beta[NumDiseaseCovar],SEbeta[NumDiseaseCovar];
double ScreeningCovar[MaxScreeningCovar];
double ScreeningBaselineHazard[100],CumulativeBaselineHazard[100],num[100],den[100];
double PropensityScore[F][M][100],MeanPropensityScore[100];
double MeanNdetected[100],MeanSibsScreened[100],MeanSibsPolyps[100],MeanSibsCancer[100],Nmeans[100];
double PredNdetected[100],PredSibsScreened[100],PredSibsPolyps[100],PredSibsCancer[100];
int nn,NNtot,NN[F][M][100],ScreenedAtAge[MaxNN];
double ZZ[MaxNN][NumVar],Alpha1[NumVar],Alpha2[NumVar],SEalpha1[NumVar],SEalpha2[NumVar];
int MinTimePrediagnosis,MaxTimePrediagnosis,minTime,maxTime,policy,PolypsFound,model;
double ProgramAgeAtScreen[F][M][S];
double SummaryPolicyMeasure[5][4][NumPolicy][NumMeasure];
double RootMeanPolypSize;
double CurrentAge;
double PreclinicallyDetectableAge[F][M][P],
NumPreclinicalTumors,NumPreclinicallyDetectableTumors,
MeanPreclinicallyDetectableTime,VarPreclinicallyDetectableTime;
int AA;
double Y10[MaxAA],Y5[MaxAA],Z10[MaxAA][12],Weight[MaxAA],
gamma[NumModel][NumPolicy][12],SEgamma[NumModel][NumPolicy][12],PopulationAverageP10;
int NumFirstScreenTargetAge[100],NumTimeBetweenScreens[2][200];
int CounterfactualClinicalCancer[F][M][NumPolicy],
CounterfactualScreenedCancer[F][M][NumPolicy],
CounterfactualPolyps[F][M][NumPolicy],
CounterfactualPositiveScreens[F][M][NumPolicy];
double AverageRisk[100][2];
char FileName[14];
FILE *sum,*sim,*chk;
// * * * * * * * * * * * * * * * * * * * *
// S I M U L A T I O N R O U T I N E S
// * * * * * * * * * * * * * * * * * * * *
void GetScreeningCovariates (int ff, int mm, double age)
{ ScreeningCovar[0] = TotalXforPolyps[ff][mm];
ScreeningCovar[1] = TotalXforScreening[ff][mm];
ScreeningCovar[4] = 0; // number of previous screens
ScreeningCovar[5] = 0; // number of positive screens
double AgeAtLastScreen=0;
for (int ss=0; ss
void GetDiseaseCovariates(int pp, int cc, int ff, int mm, double age)
{
DiseaseCovar[pp][cc][0] = TotalXforPolyps[ff][mm];
DiseaseCovar[pp][cc][1] = TotalXforScreening[ff][mm];
DiseaseCovar[pp][cc][2] = 0;
DiseaseCovar[pp][cc][3] = 0;
for (int ss=0; ss PairAge[pp]-MaxTimePrediagnosis)
{ if (ScreenCompliant[ff][mm][ss]) DiseaseCovar[pp][cc][2] ++;
if (Ndetected[ff][mm][ss]) DiseaseCovar[pp][cc][3] ++;
}
if (ScreeningVariable==1 && DiseaseCovar[pp][cc][2])
DiseaseCovar[pp][cc][2]=1; // binary version of ever screened during interval before reference date
if (NumDiseaseCovar>4)
{ GetScreeningCovariates(ff,mm,age);
for (v=0; v
if (a>97) CenterMean[v] = LastCenterMean[v];
LastCenterMean[v] = CenterMean[v];
}
MeanNdetected[a] = CenterMean[0];
MeanSibsScreened[a] = CenterMean[1];
MeanSibsPolyps[a] = CenterMean[2];
MeanSibsCancer[a] = CenterMean[3];
if (MeanSibsCancer[a]
VarAgeOfPolyps += pow(AgeOfPolyp[f][m][p],2);
if (AgeOfPolyp[f][m][p]+0.746) XdiseaseGroup=2;
MeanTimeToMutation += TimeToMutation;
VarTimeToMutation += pow(TimeToMutation,2);
MeanTimeToCancer += TimeToCancer;
VarTimeToCancer += pow(TimeToCancer,2);
if (AgeOfTumor[f][m][p]
CurrentAge=999; NextSib=-9;
for (m=0; m0.01 && error==0)
{ memset(U,0,sizeof(U));
memset(Info,0,sizeof(Info));
for (int aa=0; aa
+ gamma[4][policy][ 2]*gFH
+ gamma[4][policy][ 3]*gX*gFH
+ gamma[4][policy][ 4]*logage50
+ gamma[4][policy][ 5]*logage50*gX
+ gamma[4][policy][ 6]*logage50*gFH
+ gamma[4][policy][ 7]*logage50*gX*gFH
+ gamma[4][policy][ 8]*logage50*logage50
+ gamma[4][policy][ 9]*logage50*logage50*gX
+ gamma[4][policy][10]*logage50*logage50*gFH
+ gamma[4][policy][11]*logage50*logage50*gX*gFH;
double P5 = exp(logitP5); P5 /= 1+P5;
TTAR = 5*AverageRisk/P5;
}
else
{ double logitP10 = gamma[2][policy][ 0]
+ gamma[2][policy][ 1]*gX
+ gamma[2][policy][ 2]*gFH
+ gamma[2][policy][ 3]*gX*gFH
+ gamma[2][policy][ 4]*logage50
+ gamma[2][policy][ 5]*logage50*gX
+ gamma[2][policy][ 6]*logage50*gFH
+ gamma[2][policy][ 7]*logage50*gX*gFH
+ gamma[2][policy][ 8]*logage50*logage50
+ gamma[2][policy][ 9]*logage50*logage50*gX
+ gamma[2][policy][10]*logage50*logage50*gFH
+ gamma[2][policy][11]*logage50*logage50*gX*gFH;
double P10 = exp(logitP10); P10 /= 1+P10;
TTAR = 10*AverageRisk/P10;
}
if (TTAR
fprintf (sum,"\n\nSUMMARY STATISTICS BY SCREEN NUMBER");
fprintf (sum,"\n s Num subj Mean prop N compliant Polyp N Polyps Prob polyps Mean polyp Num polyps Time from Mean polyp ");
fprintf (sum,"\n screened age compliant (cumulative) growth screened detected size at detected polyp to size at ");
fprintf (sum,"\n rate screening per compliant detection detection");
int CumNCompliant=0;
for (s=0; s-0.746) XscreeningGroup=1;
if (TotalXforScreening[f][m]>+0.746) XscreeningGroup=2;
NXdisease[XdiseaseGroup] ++;
NXscreening[XscreeningGroup] ++;
for (a=0; a
}
fprintf (sum, "\n");
for (a=0; a
if (TotalXforPolyps[f][m]>+0.746) gX=2;
GetScreeningCovariates(f,m,double(a));
int gFH=0; if (ScreeningCovar[8]) gFH=1;
double py = AgeAtCensoring[f][m]-a;
double pyFUpos = AgeAtCensoring[f][m]-a;
if (AgeAtCancer[f][m] < AgeAtCensoring[f][m]) py = AgeAtCancer[f][m]-a;
if (py>10) py=10;
totalPY[gX][gFH][a] += py;
if (AgeAtCancer[f][m] < a+10 &&
AgeAtCancer[f][m] < AgeAtCensoring[f][m])
{ totalCases[gX][gFH][a] ++;
if (a>=20 && a5) pyFUpos=5;
totalFUposPY[gX][gFH][a] += pyFUpos;
if (AgeAtCancer[f][m] < a+5 &&
AgeAtCancer[f][m] < AgeAtCensoring[f][m])
totalFUposCases[gX][gFH][a] ++;
}
s=0;
int LastScreen=-1;
while (AgeAtScreen[f][m][s]
NcasesByNumScreens[gX]/PYbyNumScreens[gX],NcasesByNumScreens[gX],PYbyNumScreens[gX]);
fprintf (sum, "\nCancer SIR (O/E) ");
for (gX=0; gX
if (covar==1) fprintf (sum,"\n\nNumber of sibs' positive screens ");
if (covar==2) fprintf (sum,"\n\nNumber of sibs with cancer ");
if (covar==3) fprintf (sum,"\n\nSummary sib risk index ");
for (gX=0; gX
}
printf ( "\nLogistic regression predicted probabilities of 10-yr cancer risks under policy %d",policy);
fprintf (sum,"\n\nLogistic regression predicted probabilities of 10-yr cancer risks under policy %d",policy);
fprintf (sum,"\nage X0 FH=0 FH>0 FH=0 FH>0 average");
for (a=20; a
for (a=0; a
lnL += log(RRcase / (RRcase/PScase + RRctl/PSctl) );
for (int v1=0; v1
void LogisticRegression2()
{ double U[NumVar],Info[NumVar][NumVar],InvInfo[NumVar][NumVar];
// fprintf (sum,"\n\n\nIterations for conditional propensity model for screening");
double chisq2=999,Wald2;
memset (Alpha2,0,sizeof(Alpha2));
for (v=0; v0.01 && error==0)
{ memset(U,0,sizeof(U));
memset(Info,0,sizeof(Info));
for (nn=0; nn
if (v==0) fprintf (sum," intercept");
if (v==11 || v==12) fprintf (chk," %10.6f %8.6f ",Alpha2[v],SEalpha2[v]);
if (v==1) fprintf (sum," (age-70)");
if (v==2) fprintf (sum," (age-70)^2");
if (v==3) fprintf (sum," (age-70)^3");
if (v==4) fprintf (sum," X for disease");
if (v==5) fprintf (sum," X for screening");
if (v==6) fprintf (sum," First screen indicator");
if (v==7) fprintf (sum," Time since last screen");
if (v==8) fprintf (sum," Num previous screens");
if (v==9) fprintf (sum," Num positive screens");
if (v==10) fprintf (sum," Sibs previously screened");
if (v==11) fprintf (sum," Sibs previously positive");
if (v==12) fprintf (sum," Sibs with cancer");
if (v==13) fprintf (sum," Summary sib risk index");
}
fprintf (sum,"\n\nMean (SD) log Propensity Scores = %6.3f (%5.3f)\n",
MeanLnPropensityScore/(F*M),
sqrt((VarLnPropensityScore-pow(MeanLnPropensityScore,2)/(F*M))/(F*M-1)));
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * *
// P R O J E C T E D O U T C O M E S O F S C R E E N I N G P O L I C I E S
// * * * * * * * * * * * * * * * * * * * * * * * * * * * *
double TargetAgeAtFirstScreen(int policy, double age)
{ double gX = TotalXforPolyps[f][m];
GetScreeningCovariates(f,m,age);
int gFH=0; if (ScreeningCovar[8]) gFH=1;
double logitAvgP10 = log(AverageRisk[50][0] / (1 - AverageRisk[50][0]));
double A = gamma[2][policy][8] + gamma[2][policy][9]*gX + gamma[2][policy][10]*gFH + gamma[2][policy][11]*gX*gFH;
double B = gamma[2][policy][4] + gamma[2][policy][5]*gX + gamma[2][policy][ 6]*gFH + gamma[2][policy][ 7]*gX*gFH;
double C = gamma[2][policy][0] + gamma[2][policy][1]*gX + gamma[2][policy][ 2]*gFH + gamma[2][policy][ 3]*gX*gFH
- gamma[1][4][0];
double dt = (-B + sqrt(B*B-4*A*C))/(2*A);
double t = 50*exp(dt);
double logage50 = log(t/50);
if (t80) t=81;
if (policy==4) NumFirstScreenTargetAge[int(t)] ++;
// chect that predicted risk at this age equals population average
double logitP10 = gamma[2][policy][ 0]
+ gamma[2][policy][ 1]*gX
+ gamma[2][policy][ 2]*gFH
+ gamma[2][policy][ 3]*gX*gFH
+ gamma[2][policy][ 4]*logage50
+ gamma[2][policy][ 5]*logage50*gX
+ gamma[2][policy][ 6]*logage50*gFH
+ gamma[2][policy][ 7]*logage50*gX*gFH
+ gamma[2][policy][ 8]*logage50*logage50
+ gamma[2][policy][ 9]*logage50*logage50*gX
+ gamma[2][policy][10]*logage50*logage50*gFH
+ gamma[2][policy][11]*logage50*logage50*gX*gFH;
double P10 = exp(logitP10); P10 /= 1+P10;
return(t);
}
double TimeToAverageRisk(int LastPositive, int policy, double AverageRisk)
{ double TTAR=5;
double gX = TotalXforPolyps[f][m];
double t=ProgramAgeAtScreen[f][m][s-1];
double logage50 = log(t/50);
GetScreeningCovariates(f,m,t);
int gFH=0; if (ScreeningCovar[8]) gFH=1;
if (LastPositive)
{ double logitP5 = gamma[4][policy][ 0]
+ gamma[4][policy][ 1]*gX
+ gamma[4][policy][ 2]*gFH
+ gamma[4][policy][ 3]*gX*gFH
+ gamma[4][policy][ 4]*logage50
+ gamma[4][policy][ 5]*logage50*gX
+ gamma[4][policy][ 6]*logage50*gFH
+ gamma[4][policy][ 7]*logage50*gX*gFH
+ gamma[4][policy][ 8]*logage50*logage50
+ gamma[4][policy][ 9]*logage50*logage50*gX
+ gamma[4][policy][10]*logage50*logage50*gFH
+ gamma[4][policy][11]*logage50*logage50*gX*gFH;
double P5 = exp(logitP5); P5 /= 1+P5;
TTAR = 5*AverageRisk/P5;
if (TTAR>4.9 && TTAR
TTAR = 10*AverageRisk/P10;
}
if (TTAR=20 && ProgramAgeAtScreen[f][m][s-1]
for (s=0; s
// number of tumors prevented (i.e., polyps screen detected)
// out of all possibly preventable (i.e.,
if (AgeOfTumor[f][m][p] < AgeAtCensoring[f][m])
{ NumTumorsPreventable ++;
if (ProgramAgePolypDetected[f][m][p] < AgeOfTumor[f][m][p])
NumTumorsPrevented ++;
}
}
if (TotalPolypsThisScreen)
{ PositiveScreens ++;
CounterfactualPositiveScreens[f][m][policy] ++;
}
TotalPolyps += TotalPolypsThisScreen;
CounterfactualPolyps[f][m][policy] += TotalPolypsThisScreen;
}
}
SummaryPolicyMeasure[minTime][maxTime][policy][ 0] = TotalScreens;
SummaryPolicyMeasure[minTime][maxTime][policy][ 1] = PositiveScreens;
SummaryPolicyMeasure[minTime][maxTime][policy][ 2] = TotalPolyps;
SummaryPolicyMeasure[minTime][maxTime][policy][ 3] = TotalCancers;
SummaryPolicyMeasure[minTime][maxTime][policy][ 4] = NumFN;
// SummaryPolicyMeasure[minTime][maxTime][policy][ 5] = NumCancersWithin10Years;
SummaryPolicyMeasure[minTime][maxTime][policy][ 5] = NumScreenNeg;
SummaryPolicyMeasure[minTime][maxTime][policy][ 6] = NumFP;
SummaryPolicyMeasure[minTime][maxTime][policy][ 7] = NumPolypsNeverCancer;
SummaryPolicyMeasure[minTime][maxTime][policy][ 8] = NumTumorsPrevented;
SummaryPolicyMeasure[minTime][maxTime][policy][ 9] = NumTumorsPreventable;
SummaryPolicyMeasure[minTime][maxTime][policy][10] = NumCancersScreenDetected;
SummaryPolicyMeasure[minTime][maxTime][policy][11] = NumPolypsScreened;
}
void Counterfactuals()
{ int CounterfactualNumClinicalCancers[NumPolicy][NumPolicy][4],
CounterfactualNumScreenedCancers[NumPolicy][NumPolicy][4],
CounterfactualNumPolyps[NumPolicy][NumPolicy][4],
CounterfactualNumPositiveScreens[NumPolicy][NumPolicy][4];
memset (CounterfactualNumClinicalCancers,0,sizeof(CounterfactualNumClinicalCancers));
memset (CounterfactualNumScreenedCancers,0,sizeof(CounterfactualNumScreenedCancers));
memset (CounterfactualNumPolyps,0,sizeof(CounterfactualNumPolyps));
memset (CounterfactualNumPositiveScreens,0,sizeof(CounterfactualNumPositiveScreens));
int policy1,policy2;
for (policy1=1; policy1 CounterfactualPositiveScreens[f][m][policy2])
CounterfactualNumPositiveScreens[policy1][policy2][0] ++;
else if (CounterfactualPositiveScreens[f][m][policy1] < CounterfactualPositiveScreens[f][m][policy2])
CounterfactualNumPositiveScreens[policy1][policy2][2] ++;
else if (!CounterfactualPositiveScreens[f][m][policy1] && !CounterfactualPositiveScreens[f][m][policy2])
CounterfactualNumPositiveScreens[policy1][policy2][1] ++;
else CounterfactualNumPositiveScreens[policy1][policy2][3] ++;
}
}
fprintf (sum,"\n\nSummary of counterfactual outcomes for time window %d-%d",MinTimePrediagnosis,MaxTimePrediagnosis);
fprintf (sum,"\n\nCounterfactual clinical cancers:");
for (policy1=1; policy1
fprintf (sum,"\n\nCounterfactual total number of polyps detected:");
for (policy1=1; policy1
if (measure== 9) fprintf (sum," tumors preventable");
if (measure==10) fprintf (sum," total cancers screen detected");
if (measure==11) fprintf (sum," total polyps screened (detected or not)");
if (measure==12) fprintf (sum," polyps detected");
if (measure==1)
{ fprintf (sum,"\n%2d/%2d ",1,0);
for (policy=0; policy
fprintf (sum,"\n\nF=%d, M=%d, P=%d",F,M,P);
fprintf (sum,"\nlambda "); for (v=0; v