Upload
rafael-meza
View
212
Download
0
Embed Size (px)
Citation preview
ORIGINAL PAPER
Analysis of lung cancer incidence in the nurses’ healthand the health professionals’ follow-up studies usinga multistage carcinogenesis model
Rafael Meza Æ William D. Hazelton ÆGraham A. Colditz Æ Suresh H. Moolgavkar
Received: 10 July 2007 / Accepted: 12 November 2007 / Published online: 6 December 2007
� Springer Science+Business Media B.V. 2007
Abstract We analyzed lung cancer incidence among
non-smokers, continuing smokers, and ex-smokers in the
Nurses Health Study (NHS) and the Health Professionals
Follow-Up Study (HPFS) using the two-stage clonal
expansion (TSCE) model. Age-specific lung cancer inci-
dence rates among non-smokers are identical in the two
cohorts. Within the framework of the model, the main
effect of cigarette smoke is on the promotion of partially
altered cells on the pathway to cancer. Smoking-related
promotion is somewhat higher among women, whereas
smoking-related malignant conversion is somewhat lower.
In both cohorts the relative risk for a given daily level of
smoking is strongly modified by duration. Among smokers,
the incidence in NHS relative to that in HPFS depends both
on smoking intensity and duration. The age-adjusted risk is
somewhat larger in NHS, but not significantly so. After
smokers quit, the risk decreases over a period of many
years and the temporal pattern of the decline is similar to
that reported in other recent studies. Among ex-smokers,
the incidence in NHS relative to that in HPFS depends both
on previous levels of smoking and on time since quitting.
The age-adjusted risk among ex-smokers is somewhat
higher in NHS, possibly due to differences in the age-
distribution between the two cohorts.
Keywords Lung cancer epidemiology �Lung cancer age-specific incidence �Never smokers lung cancer risk � Smokers relative risk �Ex-smokers relative risk � Multistage carcinogenesis �Two-stage clonal expansion model
Introduction
The Nurses Health Study (NHS) and the Health Profes-
sionals Follow-Up Study (HPFS) constitute outstanding
dataset, to investigate in detail the relationship between
smoking and lung cancer, and to evaluate the influence of
gender both on background and smoking-induced risks. We
analyze the consequences of smoking and smoking cessa-
tion on the lung cancer incidence rates in the NHS and
HPFS using multistage carcinogenesis models. This
approach allows us to explicitly consider the entire smok-
ing histories of individuals in these cohorts, including
complex time-related factors, such as ages at start and quit,
and changes in smoking habits.
We use likelihood-based methods to estimate the
parameters of the two-stage clonal expansion (TSCE)
model. Using the model with the estimated parameters, we
construct age-specific incidence curves for non-smokers,
and for smokers and ex-smokers with pre-specified histories
of smoking. We investigate also the roles of daily intensity
of smoking and of duration of smoking on lung cancer risk.
In particular, for a given level of smoking, we examine the
impact of duration of smoking on the relative risk (RR).
Electronic supplementary material The online version of thisarticle (doi:10.1007/s10552-007-9094-5) contains supplementarymaterial, which is available to authorized users.
R. Meza � W. D. Hazelton � S. H. Moolgavkar (&)
Division of Public Health Sciences, Fred Hutchinson Cancer
Research Center, 1100 Fairview Avenue North, M2-B500,
Seattle, WA 98109-1024, USA
e-mail: [email protected]
G. A. Colditz
Channing Laboratory, Brigham and Women’s Hospital and
Harvard Medical School, Boston, MA, USA
G. A. Colditz
Siteman Cancer Center, Washington University Medical School,
St. Louis, MO, USA
123
Cancer Causes Control (2008) 19:317–328
DOI 10.1007/s10552-007-9094-5
The question of whether, for a given level of smoking,
females are at greater risk than males of developing lung
cancer has generated a great deal of debate [1–6]. The NHS
consists entirely of females and the HPFS consists entirely
of males. Our methods allow us to analyze the NHS and
HPFS simultaneously using a common model, hence we
can evaluate in a single-framework similarities and dif-
ferences in the lung cancer risk in females and males.
Finally, we use our fitted models to project lung cancer
risks for various smoking scenarios.
Methods
The nurses health and the health professionals
follow-up studies
The Nurses Health Study (NHS) was established in 1976 by
Dr. Frank Speizer. The cohort consists of 121,700 nurses
aged 30–55 at the beginning of follow-up. Every two years
the nurses post mail questionnaires about diseases, smoking
status, hormone use, and diet, among many other health-
related issues. The Health Professionals Follow-up (HPFS)
study was established in 1986 by Dr. Walter Willett. The
cohort consists of 51,529 men in the health professions aged
40–75 at the beginning of follow-up. As in the NHS, the
health professionals receive questionnaires every two-years
about diseases and health-related topics like smoking,
physical activity, and medications taken. In addition, the
participants respond to questionnaires about their diet every
four years. Less than 10% of subjects in the NHS and 7% in
the HPFS have been lost to follow-up. We exclude from our
analysis individuals who do not have complete smoking
information, non-Caucasians and those with prior history of
cancer (other than non-melanoma skin cancer). Table 1
describes the subpopulations of both cohorts that we use in
our analysis.
It is important to mention that although there was a
decade between the beginning of these studies, the age, and
birth year distributions of both cohorts are quite similar.
Smoking histories
Smoking histories, which we denote d(t), are piecewise
constant functions representing the number of cigarettes
per day smoked by each subject at any particular age, t. We
construct the smoking histories in the following way.
Clearly d(0) = 0 and we keep this constant until the age at
start of smoking. Subjects in both cohorts report their
smoking intensities at different time points in the following
categories: 0–4 cig/day, 5–14 cig/day, 15–24 cig/day,
25–34 cig/day, 35–44 cig/day, [45 cig/day. We assign
the midpoint of each category as the corresponding inten-
sity for that category, or 50 if the intensity is [45 cig/day.
Once we have done this, we compute the ages at which
subjects changed their smoking intensities (change points
of d(t)) and assign the corresponding intensity value to d(t).
In particular, we use the smoking information reported in
the initial questionnaire to calculate the change points up to
the age at entry into the study. In the initial questionnaire,
participants responded to questions about their past
smoking habits. The information provided at entry differs
between the cohorts. In the NHS, the nurses reported the
age at which they started smoking if they did, their average
smoking rate and the quitting age for ex-smokers. Thus, for
the NHS subjects, there are at most two ages of interest
before the age at entry, namely the age at start of smoking
and the age at quitting. In contrast, the subjects in the HPFS
reported their average smoking rate during specific age-
periods before their age at entry (\15, 15–19, 20–29,
30–39, 40–49, 50–59, [60 years). In this case, we only
consider the information of the age-periods with right end
point lower than their age at entry to the study. In addition,
we only take the information in the [60 age-interval if the
age at entry to the HPFS is at least 70-year-old. We assume
that the age at start of HPFS smokers is given by the mid-
point of the first age-period with a positive smoking rate. In
case their first positive exposure occurs in the \15 or [60
age-interval, then we use 13 or 65 years old as the starting
ages, respectively. Finally, if there is a change in the
smoking dose or status between two consecutive age-
periods, we assume that the change occurred in the mid-
point year in between them and assign the corresponding
age as a change point of the smoking history. The final step
is to calculate the change points after the age of entry to the
studies. This is done in a simple way, since the subjects in
both cohorts report their smoking status and dose every two
years after their entry into the study. In particular, we
compare the smoking intensities between consecutive
questionnaires and if they differ, we assume that the
change occurred at the beginning of the mid-point year and
assign the corresponding age as a change point of the
smoking history. Typical smoking histories for members of
the NHS and HPFS are shown in Fig. 1.
Table 1 Classification according to status at baseline (NHS-1976,
HPFS-1986)
Total Never Former Smokers
NHS
Subjects 104,493 51,121 24,474 28,898
Lung cancer cases 1,165 130 134 901
Avg. follow up 23.15 years
HPFS
Subjects 46,050 22,431 19,632 3,987
Lung cancer cases 461 58 247 156
Avg. follow up 14.93 years
318 Cancer Causes Control (2008) 19:317–328
123
We use likelihood-based methods to estimate parame-
ters of the TSCE model, which are functions of the
smoking history. A brief description of the model and
details of the likelihood construction are presented below.
The two-stage clonal expansion model
For our analyses, we use a multistage model that
acknowledges three phases in the process of carcinogene-
sis. In the first phase (initiation) a susceptible stem cell
acquires one or more mutations resulting in an initiated
cell, which has partially escaped growth control. In the
second phase (promotion) initiated cells undergo clonal
expansion, either spontaneously or in response to endoge-
nous or exogenous promoters. Promotion is an extremely
efficient way to bring about malignant conversion because
clonal expansion of initiated cells creates a large popula-
tion of cells that have acquired some of the genetic changes
required for malignant transformation. Finally, in the third
phase (malignant conversion) one of the initiated cells
acquires further mutational changes leading to a malignant
cell. The simplest model incorporating these three phases is
the TSCE model [7, 8]. A schematic representation of the
model is shown in the Appendix.
The TSCE model assumes that normal stem cells
become initiated according to a Poisson process with
intensity l0X, where X is the number of susceptible stem
cells. Initiated cells expand clonally (promotion) via a
linear birth and death process with rates (a,b). This means
that each time that an initiated cell divides, it can produce
two initiated cells (with birth rate a) or die/differentiate
(with death rate b). Initiated cells can also divide into
one initiated and one malignant cell (with rate l1). The
time between the first malignant cell and diagnosis is
modeled either as a constant or gamma-distributed lag.
Each of the parameters of the TSCE model may, in
principle, be affected by cigarette smoke. Recall that d(t)
denotes the cigarette consumption of an individual at age t.
We assume that each of the identifiable parameters of the
model (see Table 2) has a dose–response given by
htobðdðtÞÞ ¼ hð1þ hcdðtÞheÞ ð1Þ
where h is the background parameter, and hc and he are the
dose–response coefficients. Previous analyses using the
TSCE model [9, 10] of the relationship between smoking
and the lung cancer rates suggested that power laws are
good models for the smoking dose–response [9]. We esti-
mate the background rates and the cigarette dose–response
coefficients for each identifiable parameter in the model.
Likelihood function
The likelihood function is the product of individual like-
lihoods over all the subjects in the cohort(s). Each
10 20 30 40 50 60 700
10
20
30
40
Age
Num
ber
of c
igar
ette
s pe
r da
y
smoking dose
10 15 20 25 30 35 40 45 50 55 60 650
10
20
30
40
Age
Num
ber
of c
igar
ette
s pe
r da
y
smoking dose Follow−up
Follow−up
Born Jun1933Died fromothercause
Born Dec 1931Lung cancer freeatthe end offollow−up
Age at entry
Age at entry
Censoring time
Censoring time
Fig. 1 Example of smoking
histories for members of the
NHS and HPFS. Top panel,
subject of the NHS. Bottom
panel, subject of the HPFS
Cancer Causes Control (2008) 19:317–328 319
123
participant was lung cancer free at the beginning of the
study. Hence, to calculate the likelihood, we must condi-
tion on the fact that the individuals did not have the clinical
disease at their age of entry to the study (aei). Subjects are
censored in case of death by any other cause or in case they
survive and were never diagnosed with lung cancer until
the end of follow-up (year 2000 for our analysis). In
addition, we also censor any individuals who were
diagnosed with other types of cancer, except non-
melanoma skin-cancer. Let ali be the censoring or failure
(lung cancer diagnosis) age. The individual likelihoods are
Liðali; aei; �hðdiÞÞ ¼� S0ðali;�hðdiÞÞ
Sðaei;�hðdiÞÞfor lung cancer cases,
Sðali;�hðdiÞÞSðaei;�hðdiÞÞ
otherwise,
8<
:
ð2Þ
where Sðt; �hðdiÞÞ is the survival probability at age t of an
individual with smoking history di, and �hðdiÞ denotes the
vector of identifiable model parameters given the smoking
history di (Note: the prime denotes derivative with respect
to t). The overall likelihood is then
L ¼Y
i
Liðali; aei; �hðdiÞÞ; ð3Þ
where the product is taken over all the subjects in the
cohort(s).
The survival function
Exact expressions for the survival and hazard functions of
the TSCE with piecewise constant parameters are available
in the literature [11]. If we assume a constant or gamma lag
time between the appearance of the first malignant cell and
clinical diagnosis, the survival function required in
expression (3) is given by
where S2ðt; �hðdiÞÞ represents the TSCE model survival and
f(�) is the gamma density.
Ten-year risk predictions
Models optimized for subjects in the NHS and HPFS
cohorts are used to predict a 10-year risk estimates for lung
cancer incidence. Competing causes of mortality are
adjusted using standard actuarial methods for multiple
decrement life tables. All cause annual risk estimates are
extracted from the National Center for Health Statistics
[12]. We use the 1989–1991 life tables for both cohorts.
About 95% confidence intervals (CI) are calculated by
sampling model variables from a Markov Chain Monte
Carlo (MCMC) simulation using the Metropolis-Hastings
algorithm.
Table 2 Parameter estimates [MLE (MCMC 95% CI)]
Parameter NHS HPFS
Fixed parameters Stem cell population X 107
Initiated cells’ division rate a 3
Gamma-distributed lag time mean 5
Background rates Initiation & malignant-conversion rate l0 = l1 8.14e-8 (5.51e-8,1.27e-7)
Initiated cells’ promotion rate g = a - b - l1 0.0956 (0.0772, 0.1106)
Gamma-distributed lag time std 3.28*
Tobacco coefficients Tobacco promotion rate coefficienty gc 0.1458 (0.1010,0.1752) 0.1123 (0.0802,0.1500)
Tobacco promotion rate powery ge 0.5171 (0.4703,0.5945)
Tobacco malignant-conversion coefficient l1c 0.2095 (0.1565,0.6691) 0.5339 (0.2876,1.6972)
Tobacco malignant-conversion power l1e 0.4684 (0.1083,0.5483)
Loglik 11696.40
*95% CI not calculated. Fixed at MLE value during MCMC simulationy Applies also to the initiated cells’ division rate
Sðt; �hðdiÞÞ ¼S2ðt � tlag; �hðdiÞÞ if lag time is constant,
1�R t
0ð1� S2ðu; �hðdiÞÞÞf ðt � uÞdu if lag time is gamma-distributed,
�
ð4Þ
320 Cancer Causes Control (2008) 19:317–328
123
Ratio of age-adjusted hazards
For any particular smoking history, we use the ratio of age-
adjusted hazards as a measure of the lung cancer relative
risk between the NHS and HPFS. This ratio is calculated as
follows. We compute the TSCE model age-specific inci-
dence (ages 40–80) in each cohort using the corresponding
maximum likelihood estimate (MLE) parameters and the
specific smoking history of interest. We then adjust for age
in each cohort using the 1990 US total white population
and compute the ratio of age-adjusted hazards. In order to
calculate a 95% CI of the estimated ratio, we obtain
independent samples of the model parameters (model
described in Joint Model section) via Markov Chain Monte
Carlo (MCMC) simulations with the Metropolis Hasting
algorithm [13]. For each set of parameters in the MCMC
run, we compute the TSCE hazard in the NHS and HPFS
(using the specific smoking history of interest), adjust for
age in each cohort and compute the relative ratio between
females and males. We then calculate the 95% CI of the
ratio of age-adjusted hazards.
Estimation procedure
Estimation of the parameters is done via maximum likeli-
hood methods. The background rates and the dose–
response relationships are estimated by maximizing the
likelihood for the observed cancer incidence using the
piecewise constant exposures of cigarette for each indi-
vidual. The likelihood function calculation and its
maximization is done by High Performance Fortran rou-
tines. The Nelder–Mead simplex and the modified
Davidon-Fletcher-Powell algorithms are used for the opti-
mization. Gauss-Legendre quadratures are used for the
integration required for the computation of the survival
function when the gamma-distributed lag time (time from
malignant transformation to diagnosis) is used.
We used two estimation procedures. In the first, we fit
the background parameters to the never-smokers only and
then keeping them constant, fit the model to the entire
cohort to optimize the dose–response parameters. Second,
we fit the model to the entire cohort and estimate all the
parameters simultaneously. We find that both approaches
lead to similar fits in terms of the likelihood function.
However the first provides better fits to the number of
cancer cases in each sub-group, so it is preferred to the
later.1 All the results presented here are based on the first
estimation procedure.
Results
Not all the parameters in the TSCE model are identifiable
[11]. We use the specific parameterization shown in
Table 2. To start, we assume that all the TSCE model
parameters can be affected by cigarette exposure. Using
likelihood-ratio tests, we reduce the model to describe the
cohorts’ lung cancer incidence with as few parameters as
possible. We find that in both cohorts, only the net cell
proliferation and the malignant conversion rate have a
statistically significant dose–response.2 In addition, we find
that using a gamma-distributed lag time improves the
model fit significantly in both cohorts.3 Table 2 shows the
reduced set of parameters. The corresponding 95% CI are
constructed via MCMC simulations with the Metropolis
Hasting algorithm [13]. The TSCE model describes lung
cancer incidence in both cohorts well, as can be seen from
Fig. 2.
Independent models
First, we fit our models to both cohorts independently. In
both cohorts we find that the primary etiological mecha-
nism for lung cancer appears to be smoking-related
promotion (increased clonal expansion rate). The fitted
models have a highly significant sub-linear dose–response
on the promotion of premalignant lesions. These results are
in agreement with a previous joint analysis of the lung
cancer mortality in the British doctors’ and the American
Cancer Society CPS-I and CPS-II cohorts [10]. Interest-
ingly, the results are closer to the fits to the CPS-II cohort,
which was roughly contemporaneous with NHS and HPFS,
than to fits to the earlier CPS-I and British doctors’ cohorts.
The NHS, HPFS and CPS-II cohort had an increased dose–
response of tobacco on promotion than the earlier cohorts,
but a reduced effect on initiation. These differences may in
part be explained by changes in cigarette composition, with
higher levels of nitrosamines in the newer cigarettes acting
as promoters, while the lower tar levels may be associated
with the lower apparent initiation rate. We also find a
significant dose–response in the malignant conversion of
premalignant lesions in the NHS and HPFS. This was not
seen in CPS-II, possibly because the data did not include
follow-up for changes in smoking intensity. A dose–
response on malignant conversion has relatively short term
effects on incidence rates.
1 We also tested disjoint models for never and ever smokers.
However, these models also lead to similar fits, but with a larger
number of parameters.
2 However, a smoking effect on initiation almost doubling the
background rate is still consistent with the data.3 A gain of eight log-likelihood points with only one more parameter.
Cancer Causes Control (2008) 19:317–328 321
123
Interestingly, all parameter estimates are similar in the
NHS and HPFS cohorts, suggesting that a common model
could describe the incidence in both.
Joint model
There are reports in the literature suggesting that, for a
given level of smoking, women are at higher risk of lung
cancer than men [1–3, 5, 6]. However, a recent analysis of
the NHS and HPFS by Bain et al. [4] found no statistically
significant gender differences in the lung cancer rates
among smokers for a given level of smoking in the NHS
and HPFS cohorts. In a later correction to the original
publication, Bain et al. [14] reported a gender difference
among ex-smokers with the risk in women being 1.5 rel-
ative to men. Wakelee et al. [15] suggested in a recent
analysis of several large cohort studies, including the NHS
and HPFS, that the lung cancer incidence among never
smokers is higher in women. However, although their
estimated age-adjusted lung cancer incidence among never
smokers is slightly higher in the NHS than in the HPFS,
they do not reject the equality of the never smoker lung
cancer rates in the two cohorts.
In order to address the issue of gender differences, we
explored a joint model in the two cohorts. Multistage
models allow us to test for specific gender differences in
the initiation, promotion and malignant conversion rates of
lung cancer. Using likelihood-ratio tests, we cannot reject
the equality of the background parameters between females
and males, although we can reject the equality of all the
model parameters. In particular, a model with different
tobacco-induced promotion and malignant conversion
coefficients between women and men is the over all pre-
ferred model. Table 2 shows the parameter estimates of the
preferred joint model. All the figures in this article are
obtained using the parameter estimates of the preferred
joint model.
The NHS and HPFS cohorts contain information on
never, current, and former smokers. Figure 2 shows the
lung cancer incidence among never, former, and current
smokers in both cohorts and the model predictions. The
bottom panels in Fig. 2 show the number of lung cancer
cases in the NHS and HPFS as a function of years since
quitting.
Discussion
Methods of analyses are based on ideas of multistage
carcinogenesis are fully parametric and allow complex
patterns of exposure to multiple covariates to be explicitly
considered [9]. In the analyses reported in this article, we
have explicitly considered individual smoking histories,
including age at start of smoking, changes in levels of
smoking, and age at quitting among ex-smokers. Models
incorporating detailed smoking information on the indi-
vidual level are useful in exploring the consequences of
intervention strategies to modify smoking habits. More-
over, being biologically based, multistage models allow the
40 50 60 70 800
50
100
HPFS Never Smokers
40 50 60 70 800
500HPFS Ex−smokers
Rat
es p
er 1
00,0
00
Age
40 50 60 70 800
1000
2000
HPFS Smokers
40 50 60 70 800
50
100
NHS Never Smokers
40 50 60 70 800
500NHS Ex−smokers
Rat
es p
er 1
00,0
00
Age
40 50 60 70 800
1000
2000
NHS Smokers
5 10 15 20 25 30 35 400
200
400
Yrs since quitting5 10 15 20 25 30 35 40
0
200
400
Yrs since quitting
Fig. 2 NHS and HPFS lung
cancer incidence. Solid linemaximum likelihood estimate
from joint fit of the NHS and
HPFS, dashed lines, MCMC
95% CI. The incidence is
calculated by summing
individual one-year integrated-
hazards over all subjects at risk.
Stars show the ratio of observed
lung cancer cases to person
years at risk in five-year bins,
with 95% confidence bars based
on Poisson assumptions. Please
note the different scales in the
panels
322 Cancer Causes Control (2008) 19:317–328
123
investigation of the effects of smoking on lung cancer
initiation, promotion and malignant conversion. Hence,
multistage models provide a natural framework to evaluate
the potential benefits of chemo-prevention and pharmaco-
logical intervention strategies based on mode of action of
the intervention. Finally, analyses based on multistage
models begin with a completely different set of assump-
tions and therefore complement the traditional approaches.
In particular, these analyses do not assume proportionality
of hazards, a very strong assumption that appears to be
inappropriate in the case of lung cancer and smoking [16].
Previous analyses using multistage models
The Two-stage Clonal Expansion Model has been used to
describe the lung cancer incidence and mortality in several
cohort and case–control studies [9, 10, 17–19]. In all of
them, smoking-related promotion has been found to be the
primary etiological mechanism of lung carcinogenesis.
Interestingly, analyses of older datasets have shown also an
effect of smoking on lung cancer initiation and no effect on
malignant conversion [9, 10]. However, exactly the oppo-
site has been found in more recent dataset [18, 19]. In
particular, Heidenreich et al. [18] found in a case–control
study in Germany that smoking has significant effects on
promotion and malignant conversion and no effects on
initiation. More recently, Schollnberger et al. [19] found
similar patterns in a large cohort study carried out in 10
European countries. Interestingly, Schollnberger et al.
reported that a common model described lung cancer
incidence in males and females in the European Prospec-
tive Investigation into Cancer and Nutrition (EPIC). They
concluded that gender differences in lung cancer risk are
due entirely to differences in smoking habits. Hazelton
et al. [10] also found a limited effect of tobacco on the lung
cancer initiation in the CPS-I study, however, no effect on
malignant conversion was seen in that cohort. These dif-
ferences may in part be explained by changes in cigarette
composition, with higher levels of nitrosamines in the
newer cigarettes acting as promoters, while the lower tar
levels may be associated with the lower apparent initiation
rate. Additionally, the smoking information available in the
older cohorts may not have been detailed enough to detect
an effect on malignant conversion.
Incidence among life-long non-smokers
Our analyses indicate that the incidence of lung cancer
among life-long non-smokers is virtually identical in the
two cohorts. The incidence curves predicted by our model
along with observed incidence rates in both cohorts are
shown in the top panels of Fig. 2 and in the left panel of
Fig. 3a.
Incidence among continuing smokers
Figure 3a shows the age-specific incidence curves gener-
ated by the joint model for female and male smokers of 20
and 40 cigarettes per day. The second panels of Fig. 2
show the age-specific incidence rates among continuing
smokers in both cohorts along with the incidence curves
generated by our model.
20 40 60 800
20
40
60
Sm
okin
g R
elat
ive
Ris
k −
NH
S
20−cig40−cig
20 40 60 800
20
40
60
Sm
okin
g R
elat
ive
Ris
k −
HP
FS
20−cig40−cig
20 40 60 800
0.2
0.4
0.6
0.8
1
Age
Qui
tting
Rel
ativ
e R
isk
− N
HS
20 40 60 800
0.2
0.4
0.6
0.8
1
Age
Qui
tting
Rel
ativ
e R
isk
− H
PF
S
20−cig40−cig20−cig40−cig
20−cig40−cig20−cig40−cig
(b)
Relativerisk of smoking and quitting.
40 50 60 70 800
20
40
60
80
100
120
Age
Rat
es p
er 1
00,0
00
Never Smokers
NHSHPFS95 % CI
40 50 60 70 800
500
1000
1500
2000
2500
3000
Age
Smokers
NHS 20−cigHPFS 20−cigNHS 40−cigHPFS 40−cig
(a)
Age-specific incidence rates of lung cancer.
Fig. 3 Age-specific lung cancer incidence rates and relative risk of
smoking and quitting. Maximum likelihood lung cancer incidence
from joint model among 20 and 40 cigarette smokers in the NHS and
HPFS. Smoking starts at age 20. Former smokers quit at ages 30 or
50. (a) Predicted lung cancer incidence from joint model for never
and current smokers in the NHS and HPFS. Please note the different
scales in the panels. (b) Relative risk among smokers (current/never
smoker) and among ex-smokers (former/current smoker) for 20 and
40 cigarettes
Cancer Causes Control (2008) 19:317–328 323
123
The relative hazard associated with smoking 20 and 40
cigarettes per day in each cohort is shown in the top panels
of Fig. 3b. It is clear from this figure that the relative risks
associated with smoking are strongly modified by duration
of smoking. That this observation is not an artifact of our
model can be seen from the directly computed rate ratios in
the Cancer Prevention Study I (Burns et al. [20] , Table 11),
which show a similar concave-down picture not only
for lung cancer but also for other causes of mortality
associated with cigarette smoking. The initial increase in
RR with duration of smoking can be directly attributed to
the strong influence of tobacco on promotion. The later
decline can be attributed to the strong increase in non-
smoker incidence rates of lung cancer with age with a
concomitant leveling off of the incidence rates among
smokers predicted by the model. The strong modification
of RR by duration of smoking suggests that the propor-
tional hazards model may not be the appropriate tool for
analyses of these data.
A common model for lung cancer incidence in the NHS
and HPFS (identical model parameters) is rejected by the
likelihood-ratio test (see Joint Model section). The best
fitting model indicates that smoking-induced promotion is
somewhat higher among females, whereas smoking-
induced malignant conversion is somewhat lower. As a
result of these opposing effects on smoking-induced lung
cancer risk, the incidence curves are rather similar as
shown in the second panels of Fig. 2 and in the right panel
of Fig. 3a. The evidence of a larger effect of smoking on
promotion among females is consistent with a synergistic
effect with estrogens [2, 6], and with effects of gastrin-
releasing peptide (GRP) expression in females [21]. GRP
stimulates cell proliferation in tumors [22] and appears to
be expressed more frequently in female than in male non-
smokers and activated earlier in women in response to
tobacco exposure than in men [21]. The hazard among
females relative to that among males for smokers of 20 and
40 cigarettes per day is shown in Fig. 4. This figure shows
that relative risk increase gradually with duration of
smoking, but the confidence bands generated by MCMC
methods include 1. For smokers of 20 cigarettes per day,
the ratio of age-adjusted female to male rates is 1.1 (95%
CI = 0.77–1.29)4 and is not statistically significant, a
finding that is consistent with that reported in Bain et al.
(2004). For smokers of 40 cigarettes per day, the ratio of
age-adjusted female to male rates is 1.2 (95% CI = 0.80–
1.64).
Incidence among ex-smokers
The bottom panels of Fig. 2 show the incidence rate among
ex-smokers as a function of time since quitting. The model
30 40 50 60 70 800
0.5
1
1.5
2
2.5
Wom
en/M
en H
azar
d R
atio
20−cig Smokers
Model
95% CI
30 40 50 60 70 800
0.5
1
1.5
2
2.5
Wom
en/M
en H
azar
d R
atio
Age
40−cig Smokers
50 60 70 800
0.5
1
1.5
2
2.520−cig Ex−smokers, quit @ 50
50 60 70 800
0.5
1
1.5
2
2.5
Age
40−cig Ex−smokers, quit @ 50
Fig. 4 Women/Men hazard
ratio for current and former
smokers. Solid line Maximum
likelihood hazard ratio from
independent fits to the NHS and
HPFS. Dashed lines, MCMC
95% CI. Left panels. Women/
Men hazard ratio for smokers.
Right panels. Women/Men
hazard ratio for ex-smokers
(quit at age 50). Smoking in all
panels starts at age 20
4 Age-adjusted to the 1990 US total population. Please see Materials
and Methods for details.
324 Cancer Causes Control (2008) 19:317–328
123
predictions describe the data well in both cohorts except for
the first few years after quitting. We attribute this dis-
crepancy in the first few years to quitters who stopped
smoking because they had developed symptoms of lung
cancer. This phenomenon is well known [23, 24]. The effect
of smoking on the rate of malignant conversion implies a
rather quick decrease in risk after quitting, and the effect on
the rate of promotion implies a continuing decrease in risk
over a prolonged period of time as seen in previous analyses
of mortality data (Hazelton et al. [10]). Bottom panels of
Fig. 3b show the decrease in lung cancer incidence among
ex-smokers relative to that among continuing smokers. The
pattern of decrease in both cohorts is consistent with
that reported for mortality by Hazelton et al. [10], by Peto
et al. [25] and by Rachet et al. [16].
Figure 4 shows the female to male hazard ratio for
ex-smokers. This ratio is higher than the ratio of hazards
for continuing smokers (left panels of the figure). The
hazard ratio quickly increases to about 1.5 and remains
approximately constant. It is important to mention that
these calculations also depend on the assumed age at start
(age 20) and age at quitting (age 50). The confidence
bounds on the ratio indicate that it is border-line significant
consistent with the report by Bain et al. [14]. For
ex-smokers of 20 cigarettes per day, the ratio of age-adjusted
female to male rates is 1.35 (95% CI = 0.99–1.56).
For ex-smokers of 40 cigarettes per day, the ratio of age-
adjusted female to male rates is 1.48 (95% CI = 1.08–
1.90).
The estimated benefits of smoking cessation depend
largely on the available information at older ages, where
longer durations of both abstinence and smoking are
observed. The age-distribution of individuals differs
between the two cohorts, with a larger proportion of older
individuals present in the HPFS. Therefore, it is plausible
that the lower risk among the ex-smokers in the HPFS
predicted by the model is attributable, at least in part, to the
difference in age-distribution.
Ten-year risk predictions
A 10-year risk predictions with 95% CIs are shown in
Table 3 for different smoking patterns among continuing
smokers and for former-smokers who quit at the beginning
of the 10-year risk-projection period. These calculations
may overestimate the 10-year risk of lung cancer incidence
for heavy smokers, because population-based annual life
tables [12] were used to adjust for competing risk (No life
tables for different smoking levels were available). The
calculation of risk for smokers who quit at the beginning of
the 10-year interval was made by assuming that the
Table 3 The 10-year risk projections for smokers who smoke for 25, 40, or 50 years and continue to smoke or quit at ages 55, 65, or 75 years
based on models for White male and female smokers in the NHS and HPFS cohorts [% risk(95% CI)]
25 years 40 years 50 years
Quit Still smoking Quit Still smoking Quit Still smoking
NHS: 20-cig smokers
55 0.8 (0.6–1.2) 1.7 (0.9–3.1) 2.0 (1.6–2.7) 3.8 (2.3–6.5) * *
65 2.0 (1.5–2.7) 3.8 (2.0–6.6) 4.6 (3.7–5.7) 7.9 (4.8–12.0) 6.7 (5.6–8.0) 10.7 (7.1–15.1)
75 4.0 (2.9–5.2) 6.8 (3.7–10.6) 7.8 (6.3–9.5) 12.0 (7.7–16.7) 10.3 (8.5–12.6) 14.9 (10.2–20.1)
NHS: 40-cig smokers
55 1.8 (1.3–2.7) 4.2 (2.0–7.6) 5.4 (4.0–7.0) 10.4 (6.2–15.7) * *
65 4.1 (2.9–5.7) 8.4 (4.5–13.8) 10.0 (8.3–12.15) 16.9 (11.3–23.0) 12.3 (9.6–15.7) 18.8 (13.2–25.6)
75 7.3 (5.4–9.4) 12.7 (7.4–18.7) 13.7 (11.2–16.9) 20.4 (14.4–27.3) 14.9 (10.6–20.7) 20.8 (14.7–29.1)
HPFS: 20-cig smokers
55 0.7 (5.2–1.1) 1.8 (0.8–3.1) 1.5 (1.0–2.0) 3.4 (1.6–5.4) * *
65 1.7 (1.2–2.3) 3.8 (1.8–5.9) 3.2 (2.4–4.1) 6.6 (3.6–9.4) 4.3 (3.5–5.3) 8.3 (5.3–11.2)
75 3.1 (2.3–4.2) 5.9 (3.2–8.6) 5.2 (4.2–6.4) 9.0 (6.1–12.2) 6.5 (5.4–7.8) 10.7 (7.9–14.1)
HPFS: 40-cig smokers
55 1.4 (0.8–2.2) 3.8 (1.4–6.4) 3.4 (2.1–4.7) 7.9 (3.5–11.5) * *
65 3.1 (1.9–4.2) 7.1 (3.2–10.5) 6.3 (4.5–7.8) 12.4 (7.2–16.8) 7.8 (6.1–9.6) 13.9 (10.0–18.3)
75 5.1 (3.5–6.6) 9.6 (5.3–13.6) 8.5 (6.9–10.5) 14.2 (10.3–18.7) 9.5 (6.4–12.2) 14.7 (10.3–19.6)
Note: Life tables are used to adjust for death from competing causes. Model-based 10-year risks are shown for each smoking pattern, with
MCMC 95% CIs. Projections for individuals who quit smoking and continue to abstain for the following 10 years assume the model variables
revert to background values following smoking cessation. Asterisks are placed in cells that correspond to unrealistically early ages for starting
smoking. These rates are not generalizable and are probably lower than expected for the general population, as the cohorts are more educated and
healthy
Cancer Causes Control (2008) 19:317–328 325
123
dose–response functions return to background levels when
smoking stops. These estimates show the benefit of quitting
for any dose and duration of smoking. Risk estimates are
somewhat higher in the NHS than in the HPFS. These
estimates are consistent with 10-year risk projections based
on data from the Carotene Retinol Efficacy Trial (Table 2
of Bach et al. [26]). Estimates in Table 3 are higher than
10-year projections of lung cancer mortality risk based on
the CPS-I and CPS-II cohorts (Table 3 in Hazelton et al.
[10]).
Conclusions
We conclude that the risk of lung cancer is similar among
non-smoking and smoking men in the HPFS and women in
the NHS, but that the lung cancer risk among ex-smokers is
higher in the NHS. Within the framework of the TSCE
model, this difference can be attributed to higher smoking-
related promotion in the NHS cohort. However, it is
plausible that this is just an artifact produced by the dif-
ference in age-distribution between the two cohorts. In both
cohorts, we find that the main effect of cigarette smoke is
on the promotion of premalignant lesions. This is consis-
tent with previous analyses of several cohort and case–
control studies using the TSCE model [9, 10, 17–19]. The
relative risk of smoking is strongly dependent on duration
of smoking. For a smoker who begins to smoke before the
age of 20, the RR increases to about age 70 and declines
thereafter. This pattern is consistent with that observed in
other studies [10, 20]. Among ex-smokers, the relative risk
of former versus current smokers appears to decrease more
strongly at higher smoking levels. This finding is consistent
with the analysis of CPS-I, CPS-II and the British Doctors
cohorts in Hazelton et al. [10] and with the analysis of a
large case–control study in Rachet et al. [16].
Acknowledgments We thank the Cancer Intervention and Surveil-
lance Modeling Network (CISNET) Group, Dr. Anup Dewanji and
Dr. Jihyoun Jeon for useful suggestions. We acknowledge support
from the NIH grants RO1 CA047658 and UO1 CA97415. Financial
support: NIH grants RO1 CA047658 and UO1 CA97415.
Appendix
In this section we show how to calculate explicitly the
individual likelihoods (Eq. 2) as function of the model
parameters and the individual’s smoking history. For sim-
plicity, we assume a constant lag time between the
appearance of the first malignant cell and diagnosis, so the
survival at any age t, S(t), is equal to the TSCE model
survival at age t - tlag, S2(t - tlag) (see Eq. 4). We use the
expressions derived in Heidenreich et al. [11] for the TSCE
model survival in case of piecewise-constant parameters.
From Heidenreich et al. [11], we have that the survival
function and its derivative at any age t are
SðtÞ ¼ S2ðt � tlagÞ ¼ expXk
j¼1
l0;jX
ajln
qj � pj
fjðtj�1; tkÞ
� �( )
;
ð5Þ
S0ðtÞ ¼S02ðt � tlagÞ
¼S2ðt � tlagÞ �Xk
j¼1
�l0;jX
aj
o
otkln ðfjðtj�1; tkÞÞ;
ð6Þ
where k is the number of age-periods with different
smoking-dose before age tk: t - tlag, [tj-1,tj] denote the
end-points of the j - th age-period, dj is the smoking-dose
during the j - th age-period, l0,j, aj, gj, l1, j denote the
parameter values during the j - th age-period and
gj ¼ gð1þ gcdge
j Þ; aj ¼ að1þ gcdge
j Þ; ð7Þ
l1;j ¼ l1ð1þ l1cdl1ej Þ; l0; j ¼ l0ð1þ l0cd
l0ej Þ; ð8Þ
pj; qj ¼1
2�gj �
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
g2j þ 4ajl1; j
q� �ð9Þ
~yk ¼ 0;
~yj�1 ¼aj�1
aj
ð~yj � pjÞqjeqjðtj�1�tjÞ þ ðqj � ~yjÞpje
pjðtj�1�tjÞ
fjðtj�1; tkÞ;
ð10Þ
fjðtj�1; tkÞ ¼ ð~yj � pjÞ exp qjðtj�1 � tjÞ� �
þ ðqj � ~yjÞ exp pjðtj�1 � tjÞ� �
; ð11Þ
o
otkfkðtk�1; tkÞ ¼ ½exp qkðtk�1 � tkÞf g
� exp pkðtk�1 � tkÞf g�pkqk; ð12Þ
o
otkfjðtj�1; tkÞ ¼ ½exp qjðtj�1 � tjÞ
� �
� exp pjðtj�1 � tjÞ� �
� o
otk~yj; ð13Þ
o
otk~yk�1 ¼
ak�1
ak
ðqk � pkÞ2eðpkþqkÞðtk�1�tkÞ
ðfkðtk�1; tkÞÞ2pkqk; ð14Þ
o
otk~yj�1 ¼
aj�1
aj
ðqj � pjÞ2eðpjþqjÞðtj�1�tjÞ
ðfjðtj�1; tkÞÞ2o
otk~yj: ð15Þ
To calculate the individual likelihood (Eq. 2), we only
need to evaluate expressions (5) and (6) at the age at entry
(t = ae) and at the age at last (t = al) accordingly.
As an example, we calculate the likelihood contribution
of the individual whose smoking history is illustrated in the
top panel of Fig. 1 at the maximum likelihood estimates of
the joint-model (Table 2). The individual entered the NHS
study at age 43 (approx) and died at age 63 from other
causes. Hence, we must calculate the TSCE model survival
326 Cancer Causes Control (2008) 19:317–328
123
at ages 43 - tlag = 38 and 63 - tlag = 58 to obtain her
individual likelihood (Eq. 2). This individual started
smoking at age 20, at a constant rate of 30 cigarettes per
day, and continued with this pattern until age 42. Hence, to
calculate the TSCE model survival at age 38, we have to
consider two different age-periods, namely [0, 20]:[t0, t1]
and [20, 38]:[t1, t2]. From expression (5), we have that the
survival at age 43 is
Sð43Þ ¼ S2ð38Þ ¼ expX2
j¼1
l0;jX
ajln
qj � pj
fjðtj�1; 38Þ
( )
ð16Þ
¼ expl0;1X
a1
lnq1 � p1
f1ð0; 38Þ þl0;2X
a2
lnq2 � p2
f2ð20; 38Þ
�
ð17Þ
In this particular example, the parameter values during
the first age-period are equal to the background parameters
(i.e., d1 = 0). In contrast, during the second period, we
have to multiply the promotion rate g and the malignant
conversion rate l1 by their corresponding dose–responses
(with d2 = 30). We also multiply the initiated cells’
division rate, a, by the dose–response of g in order to
keep them consistent. In this model, there is no dose–
response on the initiation rate, l0, so it is constant between
all age-periods. Using (Eqs. 5, 7–15) and the parameter
estimates from Table 2, we obtain that
ln½S2ð38Þ�¼ð8:14�10�8Þð107Þ3
ln2:55�10�6�ð�0:0960Þ
0:0959
þð8:14�10�8Þð107Þ5:54
ln5:18�10�6�ð�0:1765Þ
0:1766¼�0:00116:
ð18Þ
So the survival at age 43 is S(43) = S2(38) = 0.9988. To
calculate the TSCE model survival at age 58, we need to
consider the following age-periods: ½0;20��½t0;t1�;½20;42��½t1;t2�; ½42;44� � ½t2;t3�;½44;46� � ½t3;t4�; ½46;48� � ½t4;t5�;½48;
55��½t5;t6� and [55, 57]:[t6, t7], [57, 58]:[t7, t8]. At each
age-period, we must adjust the model parameters by the
corresponding dose–response. Similarly as before, we
calculate the survival at age 63 using (Eqs. 5, 7–15)
Sð63Þ ¼ S2ð58Þ ¼ expX8
j¼1
l0;jX
ajln
qj � pj
fjðtj�1; 58Þ
( )
¼ 0:9768:
ð19Þ
So the likelihood contribution of this individual is equal to
Lið63; 43; �hðdiÞÞ ¼Sð63ÞSð43Þ ¼
0:9768
0:9988¼ 0:9779: ð20Þ
As a final example, let’s assume that an individual with
the same smoking history, is diagnosed with lung cancer at
age 63. In this case, we must obtain S0(63) = S02(58) to
calculate its individual likelihood (Eq. 2). Using (Eqs. 5–15)
we have that
S0ð63Þ ¼ S02ð58Þ ¼S2ð58Þ �X8
j¼1
�l0;jX
aj
o
otkln ðfjðtj�1; 58ÞÞ
¼ � 3:1171� 10�3:Þ ð21Þ
So the likelihood contribution of this individual is equal to
Lið63; 43; �hðdiÞÞ ¼ �S0ð63ÞSð43Þ ¼
3:1171� 10�3
0:9988
¼ 3:1208� 10�3: ð22Þ
Fortran and R code to compute the TSCE hazard and
survival functions in case of piecewise-constant parameters
are available from the authors by request.
References
1. Risch HA, Howe GR, Jain M, Burch JD, Holowaty EJ, Miller AB
(1993) Are female smokers at higher risk for lung cancer than
male smokers? A case–control analysis by histologic type. Am J
Epidemiol 138(5):281–293
2. Zang EA, Wynder EL (1996) Differences in lung cancer risk
between men and women: examination of the evidence. J Natl
Cancer Inst 88(3–4):183–192
3. Perneger TV (2001) Sex, smoking, and cancer: a reappraisal.
J Natl Cancer Inst 93(21):1600–1602
4. Bain C, Feskanich D, Speizer FE, Thun M, Hertzmark E, Rosner
BA, et al (2004) Lung cancer rates in men and women with com-
parable histories of smoking. J Natl Cancer Inst 96(11):826–834
5. Blot WJ, McLaughlin JK (2004) Are women more susceptible to
lung cancer? J Natl Cancer Inst 96(11):812–813
6. Patel JD, Bach PB, Kris MG (2004) Lung cancer in US women: a
contemporary epidemic. JAMA 291(14):1763–1768
7. Moolgavkar SH, Venzon DJ (1979) Two-event models for car-
cinogenesis: incidence curves for childhood and adult tumors.
Math Biosci 47(1):55–77
8. Moolgavkar SH, Knudson AG (1981) Mutation and cancer: a
model for human carcinogenesis. J Natl Cancer Inst 66(6):
1037–1052
9. Hazelton WD, Luebeck EG, Heidenreich WF, Moolgavkar SH
(2001) Analysis of a historical cohort of Chinese tin miners with
arsenic, radon, cigarette smoke, and pipe smoke exposures using
the biologically based two-stage clonal expansion model. Radiat
Res 156(1):78–94
10. Hazelton WD, Clements MS, Moolgavkar SH (2005) Multistage
carcinogenesis and lung cancer mortality in three cohorts. Cancer
Epidemiol Biomarkers Prev 14(5):1171–1181
11. Heidenreich WF, Luebeck EG, Moolgavkar SH (1997) Some
properties of the hazard function of the two-mutation clonal
expansion model. Risk Anal 17(3):391–399
12. US Decennial Life Tables for 1989–91. vol 1. Hyattsville,
Maryland: US Department of Health and Human Services,
National Center for Health and Statistics; 1997
13. Gilks WR, Richardson S, Spiegelhalter DJ (eds) (1996) Markov
chain Monte Carlo in practice. Interdisciplinary Statistics.
Chapman & Hall, London
14. Bain C, Feskanich D, Speizer FE, Thun M, Hertzmark E, Rosner
BA, et al (2004) Erratum. J Natl Cancer Inst 96(23):1796–1797
Cancer Causes Control (2008) 19:317–328 327
123
15. Wakelee HA, Chang ET, Gomez SL, Keegan TH, Feskanich D,
Clarke CA, et al (2007) Lung cancer incidence in never smokers.
J Clin Oncol 25(5):472–478
16. Rachet B, Siemiatycki J, Abrahamowicz M, Leffondre K (2004)
A flexible modeling approach to estimating the component
effects of smoking behavior on lung cancer. J Clin Epidemiol
57(10):1076–1085
17. Luebeck EG, Heidenreich WF, Hazelton WD, Paretzke HG,
Moolgavkar SH (1999) Biologically based analysis of the data for
the Colorado uranium miners cohort: age, dose and dose-rate
effects. Radiat Res 152(4):339–351
18. Heidenreich WF, Wellmann J, Jacob P, Wichmann HE (2002)
Mechanistic modelling in large case–control studies of lung
cancer risk from smoking. Stat Med 21(20):3055–3070
19. Schollnberger H, Manuguerra M, Bijwaard H, Boshuizen H,
Altenburg HP, Rispens SM, et al (2006) Analysis of epidemio-
logical cohort data on smoking effects and lung cancer with a
multi-stage cancer model. Carcinogenesis 27(7):1432–1444
20. Burns DM, Shanks TG, Choi W, Thun MJ, Heath CW, Garfinkel
L (1997) The American Cancer Society Cancer Prevention Study
I: 12-year followup of 1 million men and women. In: Burns DM,
Garfinkel L, Samet JM (eds) Changes in cigarette-related disease
risks and their implications for prevention and control. Smoking
and Tobacco Control, Monograph 8, NIH Publ No 97–4213, pp
113–304
21. Shriver SP, Bourdeau HA, Gubish CT, Tirpak DL, Davis AL,
Luketich JD, et al (2000) Sex-specific expression of gastrin-
releasing peptide receptor: relationship to smoking history and
risk of lung cancer. J Natl Cancer Inst 92(1):24–33
22. Siegfried JM, DeMichele MA, Hunt JD, Davis AG, Vohra KP,
Pilewski JM (1997) Expression of mRNA for gastrin-releasing
peptide receptor by human bronchial epithelial cells. Association
with prolonged tobacco exposure and responsiveness to bombe-
sin-like peptides. Am J Respir Crit Care Med 156(2 Pt 1):
358–366
23. Hauptmann M, Lubin JH, Rosenberg P, Wellmann J, Kreienbrock
L (2000) The use of sliding time windows for the exploratory
analysis of temporal effects of smoking histories on lung cancer
risk. Stat Med 19(16):2185–2194
24. Leffondre K, Abrahamowicz M, Siemiatycki J, Rachet B (2002)
Modeling smoking history: a comparison of different approaches.
Am J Epidemiol 156(9):813–823
25. Peto R, Darby S, Deo H, Silcocks P, Whitley E, Doll R (2000)
Smoking, smoking cessation, and lung cancer in the UK since
1950: combination of national statistics with two case–control
studies. BMJ 321(7257):323–329
26. Bach PB, Kattan MW, Thornquist MD, Kris MG, Tate RC,
Barnett MJ, et al (2003) Variations in lung cancer risk among
smokers. J Natl Cancer Inst 95(6):470–478
328 Cancer Causes Control (2008) 19:317–328
123