12
ORIGINAL PAPER Analysis of lung cancer incidence in the nurses’ health and the health professionals’ follow-up studies using a multistage carcinogenesis model Rafael Meza William D. Hazelton Graham A. Colditz Suresh H. Moolgavkar Received: 10 July 2007 / Accepted: 12 November 2007 / Published online: 6 December 2007 Ó Springer Science+Business Media B.V. 2007 Abstract We analyzed lung cancer incidence among non-smokers, continuing smokers, and ex-smokers in the Nurses Health Study (NHS) and the Health Professionals Follow-Up Study (HPFS) using the two-stage clonal expansion (TSCE) model. Age-specific lung cancer inci- dence rates among non-smokers are identical in the two cohorts. Within the framework of the model, the main effect of cigarette smoke is on the promotion of partially altered cells on the pathway to cancer. Smoking-related promotion is somewhat higher among women, whereas smoking-related malignant conversion is somewhat lower. In both cohorts the relative risk for a given daily level of smoking is strongly modified by duration. Among smokers, the incidence in NHS relative to that in HPFS depends both on smoking intensity and duration. The age-adjusted risk is somewhat larger in NHS, but not significantly so. After smokers quit, the risk decreases over a period of many years and the temporal pattern of the decline is similar to that reported in other recent studies. Among ex-smokers, the incidence in NHS relative to that in HPFS depends both on previous levels of smoking and on time since quitting. The age-adjusted risk among ex-smokers is somewhat higher in NHS, possibly due to differences in the age- distribution between the two cohorts. Keywords Lung cancer epidemiology Lung cancer age-specific incidence Never smokers lung cancer risk Smokers relative risk Ex-smokers relative risk Multistage carcinogenesis Two-stage clonal expansion model Introduction The Nurses Health Study (NHS) and the Health Profes- sionals Follow-Up Study (HPFS) constitute outstanding dataset, to investigate in detail the relationship between smoking and lung cancer, and to evaluate the influence of gender both on background and smoking-induced risks. We analyze the consequences of smoking and smoking cessa- tion on the lung cancer incidence rates in the NHS and HPFS using multistage carcinogenesis models. This approach allows us to explicitly consider the entire smok- ing histories of individuals in these cohorts, including complex time-related factors, such as ages at start and quit, and changes in smoking habits. We use likelihood-based methods to estimate the parameters of the two-stage clonal expansion (TSCE) model. Using the model with the estimated parameters, we construct age-specific incidence curves for non-smokers, and for smokers and ex-smokers with pre-specified histories of smoking. We investigate also the roles of daily intensity of smoking and of duration of smoking on lung cancer risk. In particular, for a given level of smoking, we examine the impact of duration of smoking on the relative risk (RR). Electronic supplementary material The online version of this article (doi:10.1007/s10552-007-9094-5) contains supplementary material, which is available to authorized users. R. Meza W. D. Hazelton S. H. Moolgavkar (&) Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, M2-B500, Seattle, WA 98109-1024, USA e-mail: [email protected] G. A. Colditz Channing Laboratory, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA G. A. Colditz Siteman Cancer Center, Washington University Medical School, St. Louis, MO, USA 123 Cancer Causes Control (2008) 19:317–328 DOI 10.1007/s10552-007-9094-5

Analysis of lung cancer incidence in the nurses’ health and the health professionals’ follow-up studies using a multistage carcinogenesis model

Embed Size (px)

Citation preview

ORIGINAL PAPER

Analysis of lung cancer incidence in the nurses’ healthand the health professionals’ follow-up studies usinga multistage carcinogenesis model

Rafael Meza Æ William D. Hazelton ÆGraham A. Colditz Æ Suresh H. Moolgavkar

Received: 10 July 2007 / Accepted: 12 November 2007 / Published online: 6 December 2007

� Springer Science+Business Media B.V. 2007

Abstract We analyzed lung cancer incidence among

non-smokers, continuing smokers, and ex-smokers in the

Nurses Health Study (NHS) and the Health Professionals

Follow-Up Study (HPFS) using the two-stage clonal

expansion (TSCE) model. Age-specific lung cancer inci-

dence rates among non-smokers are identical in the two

cohorts. Within the framework of the model, the main

effect of cigarette smoke is on the promotion of partially

altered cells on the pathway to cancer. Smoking-related

promotion is somewhat higher among women, whereas

smoking-related malignant conversion is somewhat lower.

In both cohorts the relative risk for a given daily level of

smoking is strongly modified by duration. Among smokers,

the incidence in NHS relative to that in HPFS depends both

on smoking intensity and duration. The age-adjusted risk is

somewhat larger in NHS, but not significantly so. After

smokers quit, the risk decreases over a period of many

years and the temporal pattern of the decline is similar to

that reported in other recent studies. Among ex-smokers,

the incidence in NHS relative to that in HPFS depends both

on previous levels of smoking and on time since quitting.

The age-adjusted risk among ex-smokers is somewhat

higher in NHS, possibly due to differences in the age-

distribution between the two cohorts.

Keywords Lung cancer epidemiology �Lung cancer age-specific incidence �Never smokers lung cancer risk � Smokers relative risk �Ex-smokers relative risk � Multistage carcinogenesis �Two-stage clonal expansion model

Introduction

The Nurses Health Study (NHS) and the Health Profes-

sionals Follow-Up Study (HPFS) constitute outstanding

dataset, to investigate in detail the relationship between

smoking and lung cancer, and to evaluate the influence of

gender both on background and smoking-induced risks. We

analyze the consequences of smoking and smoking cessa-

tion on the lung cancer incidence rates in the NHS and

HPFS using multistage carcinogenesis models. This

approach allows us to explicitly consider the entire smok-

ing histories of individuals in these cohorts, including

complex time-related factors, such as ages at start and quit,

and changes in smoking habits.

We use likelihood-based methods to estimate the

parameters of the two-stage clonal expansion (TSCE)

model. Using the model with the estimated parameters, we

construct age-specific incidence curves for non-smokers,

and for smokers and ex-smokers with pre-specified histories

of smoking. We investigate also the roles of daily intensity

of smoking and of duration of smoking on lung cancer risk.

In particular, for a given level of smoking, we examine the

impact of duration of smoking on the relative risk (RR).

Electronic supplementary material The online version of thisarticle (doi:10.1007/s10552-007-9094-5) contains supplementarymaterial, which is available to authorized users.

R. Meza � W. D. Hazelton � S. H. Moolgavkar (&)

Division of Public Health Sciences, Fred Hutchinson Cancer

Research Center, 1100 Fairview Avenue North, M2-B500,

Seattle, WA 98109-1024, USA

e-mail: [email protected]

G. A. Colditz

Channing Laboratory, Brigham and Women’s Hospital and

Harvard Medical School, Boston, MA, USA

G. A. Colditz

Siteman Cancer Center, Washington University Medical School,

St. Louis, MO, USA

123

Cancer Causes Control (2008) 19:317–328

DOI 10.1007/s10552-007-9094-5

The question of whether, for a given level of smoking,

females are at greater risk than males of developing lung

cancer has generated a great deal of debate [1–6]. The NHS

consists entirely of females and the HPFS consists entirely

of males. Our methods allow us to analyze the NHS and

HPFS simultaneously using a common model, hence we

can evaluate in a single-framework similarities and dif-

ferences in the lung cancer risk in females and males.

Finally, we use our fitted models to project lung cancer

risks for various smoking scenarios.

Methods

The nurses health and the health professionals

follow-up studies

The Nurses Health Study (NHS) was established in 1976 by

Dr. Frank Speizer. The cohort consists of 121,700 nurses

aged 30–55 at the beginning of follow-up. Every two years

the nurses post mail questionnaires about diseases, smoking

status, hormone use, and diet, among many other health-

related issues. The Health Professionals Follow-up (HPFS)

study was established in 1986 by Dr. Walter Willett. The

cohort consists of 51,529 men in the health professions aged

40–75 at the beginning of follow-up. As in the NHS, the

health professionals receive questionnaires every two-years

about diseases and health-related topics like smoking,

physical activity, and medications taken. In addition, the

participants respond to questionnaires about their diet every

four years. Less than 10% of subjects in the NHS and 7% in

the HPFS have been lost to follow-up. We exclude from our

analysis individuals who do not have complete smoking

information, non-Caucasians and those with prior history of

cancer (other than non-melanoma skin cancer). Table 1

describes the subpopulations of both cohorts that we use in

our analysis.

It is important to mention that although there was a

decade between the beginning of these studies, the age, and

birth year distributions of both cohorts are quite similar.

Smoking histories

Smoking histories, which we denote d(t), are piecewise

constant functions representing the number of cigarettes

per day smoked by each subject at any particular age, t. We

construct the smoking histories in the following way.

Clearly d(0) = 0 and we keep this constant until the age at

start of smoking. Subjects in both cohorts report their

smoking intensities at different time points in the following

categories: 0–4 cig/day, 5–14 cig/day, 15–24 cig/day,

25–34 cig/day, 35–44 cig/day, [45 cig/day. We assign

the midpoint of each category as the corresponding inten-

sity for that category, or 50 if the intensity is [45 cig/day.

Once we have done this, we compute the ages at which

subjects changed their smoking intensities (change points

of d(t)) and assign the corresponding intensity value to d(t).

In particular, we use the smoking information reported in

the initial questionnaire to calculate the change points up to

the age at entry into the study. In the initial questionnaire,

participants responded to questions about their past

smoking habits. The information provided at entry differs

between the cohorts. In the NHS, the nurses reported the

age at which they started smoking if they did, their average

smoking rate and the quitting age for ex-smokers. Thus, for

the NHS subjects, there are at most two ages of interest

before the age at entry, namely the age at start of smoking

and the age at quitting. In contrast, the subjects in the HPFS

reported their average smoking rate during specific age-

periods before their age at entry (\15, 15–19, 20–29,

30–39, 40–49, 50–59, [60 years). In this case, we only

consider the information of the age-periods with right end

point lower than their age at entry to the study. In addition,

we only take the information in the [60 age-interval if the

age at entry to the HPFS is at least 70-year-old. We assume

that the age at start of HPFS smokers is given by the mid-

point of the first age-period with a positive smoking rate. In

case their first positive exposure occurs in the \15 or [60

age-interval, then we use 13 or 65 years old as the starting

ages, respectively. Finally, if there is a change in the

smoking dose or status between two consecutive age-

periods, we assume that the change occurred in the mid-

point year in between them and assign the corresponding

age as a change point of the smoking history. The final step

is to calculate the change points after the age of entry to the

studies. This is done in a simple way, since the subjects in

both cohorts report their smoking status and dose every two

years after their entry into the study. In particular, we

compare the smoking intensities between consecutive

questionnaires and if they differ, we assume that the

change occurred at the beginning of the mid-point year and

assign the corresponding age as a change point of the

smoking history. Typical smoking histories for members of

the NHS and HPFS are shown in Fig. 1.

Table 1 Classification according to status at baseline (NHS-1976,

HPFS-1986)

Total Never Former Smokers

NHS

Subjects 104,493 51,121 24,474 28,898

Lung cancer cases 1,165 130 134 901

Avg. follow up 23.15 years

HPFS

Subjects 46,050 22,431 19,632 3,987

Lung cancer cases 461 58 247 156

Avg. follow up 14.93 years

318 Cancer Causes Control (2008) 19:317–328

123

We use likelihood-based methods to estimate parame-

ters of the TSCE model, which are functions of the

smoking history. A brief description of the model and

details of the likelihood construction are presented below.

The two-stage clonal expansion model

For our analyses, we use a multistage model that

acknowledges three phases in the process of carcinogene-

sis. In the first phase (initiation) a susceptible stem cell

acquires one or more mutations resulting in an initiated

cell, which has partially escaped growth control. In the

second phase (promotion) initiated cells undergo clonal

expansion, either spontaneously or in response to endoge-

nous or exogenous promoters. Promotion is an extremely

efficient way to bring about malignant conversion because

clonal expansion of initiated cells creates a large popula-

tion of cells that have acquired some of the genetic changes

required for malignant transformation. Finally, in the third

phase (malignant conversion) one of the initiated cells

acquires further mutational changes leading to a malignant

cell. The simplest model incorporating these three phases is

the TSCE model [7, 8]. A schematic representation of the

model is shown in the Appendix.

The TSCE model assumes that normal stem cells

become initiated according to a Poisson process with

intensity l0X, where X is the number of susceptible stem

cells. Initiated cells expand clonally (promotion) via a

linear birth and death process with rates (a,b). This means

that each time that an initiated cell divides, it can produce

two initiated cells (with birth rate a) or die/differentiate

(with death rate b). Initiated cells can also divide into

one initiated and one malignant cell (with rate l1). The

time between the first malignant cell and diagnosis is

modeled either as a constant or gamma-distributed lag.

Each of the parameters of the TSCE model may, in

principle, be affected by cigarette smoke. Recall that d(t)

denotes the cigarette consumption of an individual at age t.

We assume that each of the identifiable parameters of the

model (see Table 2) has a dose–response given by

htobðdðtÞÞ ¼ hð1þ hcdðtÞheÞ ð1Þ

where h is the background parameter, and hc and he are the

dose–response coefficients. Previous analyses using the

TSCE model [9, 10] of the relationship between smoking

and the lung cancer rates suggested that power laws are

good models for the smoking dose–response [9]. We esti-

mate the background rates and the cigarette dose–response

coefficients for each identifiable parameter in the model.

Likelihood function

The likelihood function is the product of individual like-

lihoods over all the subjects in the cohort(s). Each

10 20 30 40 50 60 700

10

20

30

40

Age

Num

ber

of c

igar

ette

s pe

r da

y

smoking dose

10 15 20 25 30 35 40 45 50 55 60 650

10

20

30

40

Age

Num

ber

of c

igar

ette

s pe

r da

y

smoking dose Follow−up

Follow−up

Born Jun1933Died fromothercause

Born Dec 1931Lung cancer freeatthe end offollow−up

Age at entry

Age at entry

Censoring time

Censoring time

Fig. 1 Example of smoking

histories for members of the

NHS and HPFS. Top panel,

subject of the NHS. Bottom

panel, subject of the HPFS

Cancer Causes Control (2008) 19:317–328 319

123

participant was lung cancer free at the beginning of the

study. Hence, to calculate the likelihood, we must condi-

tion on the fact that the individuals did not have the clinical

disease at their age of entry to the study (aei). Subjects are

censored in case of death by any other cause or in case they

survive and were never diagnosed with lung cancer until

the end of follow-up (year 2000 for our analysis). In

addition, we also censor any individuals who were

diagnosed with other types of cancer, except non-

melanoma skin-cancer. Let ali be the censoring or failure

(lung cancer diagnosis) age. The individual likelihoods are

Liðali; aei; �hðdiÞÞ ¼� S0ðali;�hðdiÞÞ

Sðaei;�hðdiÞÞfor lung cancer cases,

Sðali;�hðdiÞÞSðaei;�hðdiÞÞ

otherwise,

8<

:

ð2Þ

where Sðt; �hðdiÞÞ is the survival probability at age t of an

individual with smoking history di, and �hðdiÞ denotes the

vector of identifiable model parameters given the smoking

history di (Note: the prime denotes derivative with respect

to t). The overall likelihood is then

L ¼Y

i

Liðali; aei; �hðdiÞÞ; ð3Þ

where the product is taken over all the subjects in the

cohort(s).

The survival function

Exact expressions for the survival and hazard functions of

the TSCE with piecewise constant parameters are available

in the literature [11]. If we assume a constant or gamma lag

time between the appearance of the first malignant cell and

clinical diagnosis, the survival function required in

expression (3) is given by

where S2ðt; �hðdiÞÞ represents the TSCE model survival and

f(�) is the gamma density.

Ten-year risk predictions

Models optimized for subjects in the NHS and HPFS

cohorts are used to predict a 10-year risk estimates for lung

cancer incidence. Competing causes of mortality are

adjusted using standard actuarial methods for multiple

decrement life tables. All cause annual risk estimates are

extracted from the National Center for Health Statistics

[12]. We use the 1989–1991 life tables for both cohorts.

About 95% confidence intervals (CI) are calculated by

sampling model variables from a Markov Chain Monte

Carlo (MCMC) simulation using the Metropolis-Hastings

algorithm.

Table 2 Parameter estimates [MLE (MCMC 95% CI)]

Parameter NHS HPFS

Fixed parameters Stem cell population X 107

Initiated cells’ division rate a 3

Gamma-distributed lag time mean 5

Background rates Initiation & malignant-conversion rate l0 = l1 8.14e-8 (5.51e-8,1.27e-7)

Initiated cells’ promotion rate g = a - b - l1 0.0956 (0.0772, 0.1106)

Gamma-distributed lag time std 3.28*

Tobacco coefficients Tobacco promotion rate coefficienty gc 0.1458 (0.1010,0.1752) 0.1123 (0.0802,0.1500)

Tobacco promotion rate powery ge 0.5171 (0.4703,0.5945)

Tobacco malignant-conversion coefficient l1c 0.2095 (0.1565,0.6691) 0.5339 (0.2876,1.6972)

Tobacco malignant-conversion power l1e 0.4684 (0.1083,0.5483)

Loglik 11696.40

*95% CI not calculated. Fixed at MLE value during MCMC simulationy Applies also to the initiated cells’ division rate

Sðt; �hðdiÞÞ ¼S2ðt � tlag; �hðdiÞÞ if lag time is constant,

1�R t

0ð1� S2ðu; �hðdiÞÞÞf ðt � uÞdu if lag time is gamma-distributed,

ð4Þ

320 Cancer Causes Control (2008) 19:317–328

123

Ratio of age-adjusted hazards

For any particular smoking history, we use the ratio of age-

adjusted hazards as a measure of the lung cancer relative

risk between the NHS and HPFS. This ratio is calculated as

follows. We compute the TSCE model age-specific inci-

dence (ages 40–80) in each cohort using the corresponding

maximum likelihood estimate (MLE) parameters and the

specific smoking history of interest. We then adjust for age

in each cohort using the 1990 US total white population

and compute the ratio of age-adjusted hazards. In order to

calculate a 95% CI of the estimated ratio, we obtain

independent samples of the model parameters (model

described in Joint Model section) via Markov Chain Monte

Carlo (MCMC) simulations with the Metropolis Hasting

algorithm [13]. For each set of parameters in the MCMC

run, we compute the TSCE hazard in the NHS and HPFS

(using the specific smoking history of interest), adjust for

age in each cohort and compute the relative ratio between

females and males. We then calculate the 95% CI of the

ratio of age-adjusted hazards.

Estimation procedure

Estimation of the parameters is done via maximum likeli-

hood methods. The background rates and the dose–

response relationships are estimated by maximizing the

likelihood for the observed cancer incidence using the

piecewise constant exposures of cigarette for each indi-

vidual. The likelihood function calculation and its

maximization is done by High Performance Fortran rou-

tines. The Nelder–Mead simplex and the modified

Davidon-Fletcher-Powell algorithms are used for the opti-

mization. Gauss-Legendre quadratures are used for the

integration required for the computation of the survival

function when the gamma-distributed lag time (time from

malignant transformation to diagnosis) is used.

We used two estimation procedures. In the first, we fit

the background parameters to the never-smokers only and

then keeping them constant, fit the model to the entire

cohort to optimize the dose–response parameters. Second,

we fit the model to the entire cohort and estimate all the

parameters simultaneously. We find that both approaches

lead to similar fits in terms of the likelihood function.

However the first provides better fits to the number of

cancer cases in each sub-group, so it is preferred to the

later.1 All the results presented here are based on the first

estimation procedure.

Results

Not all the parameters in the TSCE model are identifiable

[11]. We use the specific parameterization shown in

Table 2. To start, we assume that all the TSCE model

parameters can be affected by cigarette exposure. Using

likelihood-ratio tests, we reduce the model to describe the

cohorts’ lung cancer incidence with as few parameters as

possible. We find that in both cohorts, only the net cell

proliferation and the malignant conversion rate have a

statistically significant dose–response.2 In addition, we find

that using a gamma-distributed lag time improves the

model fit significantly in both cohorts.3 Table 2 shows the

reduced set of parameters. The corresponding 95% CI are

constructed via MCMC simulations with the Metropolis

Hasting algorithm [13]. The TSCE model describes lung

cancer incidence in both cohorts well, as can be seen from

Fig. 2.

Independent models

First, we fit our models to both cohorts independently. In

both cohorts we find that the primary etiological mecha-

nism for lung cancer appears to be smoking-related

promotion (increased clonal expansion rate). The fitted

models have a highly significant sub-linear dose–response

on the promotion of premalignant lesions. These results are

in agreement with a previous joint analysis of the lung

cancer mortality in the British doctors’ and the American

Cancer Society CPS-I and CPS-II cohorts [10]. Interest-

ingly, the results are closer to the fits to the CPS-II cohort,

which was roughly contemporaneous with NHS and HPFS,

than to fits to the earlier CPS-I and British doctors’ cohorts.

The NHS, HPFS and CPS-II cohort had an increased dose–

response of tobacco on promotion than the earlier cohorts,

but a reduced effect on initiation. These differences may in

part be explained by changes in cigarette composition, with

higher levels of nitrosamines in the newer cigarettes acting

as promoters, while the lower tar levels may be associated

with the lower apparent initiation rate. We also find a

significant dose–response in the malignant conversion of

premalignant lesions in the NHS and HPFS. This was not

seen in CPS-II, possibly because the data did not include

follow-up for changes in smoking intensity. A dose–

response on malignant conversion has relatively short term

effects on incidence rates.

1 We also tested disjoint models for never and ever smokers.

However, these models also lead to similar fits, but with a larger

number of parameters.

2 However, a smoking effect on initiation almost doubling the

background rate is still consistent with the data.3 A gain of eight log-likelihood points with only one more parameter.

Cancer Causes Control (2008) 19:317–328 321

123

Interestingly, all parameter estimates are similar in the

NHS and HPFS cohorts, suggesting that a common model

could describe the incidence in both.

Joint model

There are reports in the literature suggesting that, for a

given level of smoking, women are at higher risk of lung

cancer than men [1–3, 5, 6]. However, a recent analysis of

the NHS and HPFS by Bain et al. [4] found no statistically

significant gender differences in the lung cancer rates

among smokers for a given level of smoking in the NHS

and HPFS cohorts. In a later correction to the original

publication, Bain et al. [14] reported a gender difference

among ex-smokers with the risk in women being 1.5 rel-

ative to men. Wakelee et al. [15] suggested in a recent

analysis of several large cohort studies, including the NHS

and HPFS, that the lung cancer incidence among never

smokers is higher in women. However, although their

estimated age-adjusted lung cancer incidence among never

smokers is slightly higher in the NHS than in the HPFS,

they do not reject the equality of the never smoker lung

cancer rates in the two cohorts.

In order to address the issue of gender differences, we

explored a joint model in the two cohorts. Multistage

models allow us to test for specific gender differences in

the initiation, promotion and malignant conversion rates of

lung cancer. Using likelihood-ratio tests, we cannot reject

the equality of the background parameters between females

and males, although we can reject the equality of all the

model parameters. In particular, a model with different

tobacco-induced promotion and malignant conversion

coefficients between women and men is the over all pre-

ferred model. Table 2 shows the parameter estimates of the

preferred joint model. All the figures in this article are

obtained using the parameter estimates of the preferred

joint model.

The NHS and HPFS cohorts contain information on

never, current, and former smokers. Figure 2 shows the

lung cancer incidence among never, former, and current

smokers in both cohorts and the model predictions. The

bottom panels in Fig. 2 show the number of lung cancer

cases in the NHS and HPFS as a function of years since

quitting.

Discussion

Methods of analyses are based on ideas of multistage

carcinogenesis are fully parametric and allow complex

patterns of exposure to multiple covariates to be explicitly

considered [9]. In the analyses reported in this article, we

have explicitly considered individual smoking histories,

including age at start of smoking, changes in levels of

smoking, and age at quitting among ex-smokers. Models

incorporating detailed smoking information on the indi-

vidual level are useful in exploring the consequences of

intervention strategies to modify smoking habits. More-

over, being biologically based, multistage models allow the

40 50 60 70 800

50

100

HPFS Never Smokers

40 50 60 70 800

500HPFS Ex−smokers

Rat

es p

er 1

00,0

00

Age

40 50 60 70 800

1000

2000

HPFS Smokers

40 50 60 70 800

50

100

NHS Never Smokers

40 50 60 70 800

500NHS Ex−smokers

Rat

es p

er 1

00,0

00

Age

40 50 60 70 800

1000

2000

NHS Smokers

5 10 15 20 25 30 35 400

200

400

Yrs since quitting5 10 15 20 25 30 35 40

0

200

400

Yrs since quitting

Fig. 2 NHS and HPFS lung

cancer incidence. Solid linemaximum likelihood estimate

from joint fit of the NHS and

HPFS, dashed lines, MCMC

95% CI. The incidence is

calculated by summing

individual one-year integrated-

hazards over all subjects at risk.

Stars show the ratio of observed

lung cancer cases to person

years at risk in five-year bins,

with 95% confidence bars based

on Poisson assumptions. Please

note the different scales in the

panels

322 Cancer Causes Control (2008) 19:317–328

123

investigation of the effects of smoking on lung cancer

initiation, promotion and malignant conversion. Hence,

multistage models provide a natural framework to evaluate

the potential benefits of chemo-prevention and pharmaco-

logical intervention strategies based on mode of action of

the intervention. Finally, analyses based on multistage

models begin with a completely different set of assump-

tions and therefore complement the traditional approaches.

In particular, these analyses do not assume proportionality

of hazards, a very strong assumption that appears to be

inappropriate in the case of lung cancer and smoking [16].

Previous analyses using multistage models

The Two-stage Clonal Expansion Model has been used to

describe the lung cancer incidence and mortality in several

cohort and case–control studies [9, 10, 17–19]. In all of

them, smoking-related promotion has been found to be the

primary etiological mechanism of lung carcinogenesis.

Interestingly, analyses of older datasets have shown also an

effect of smoking on lung cancer initiation and no effect on

malignant conversion [9, 10]. However, exactly the oppo-

site has been found in more recent dataset [18, 19]. In

particular, Heidenreich et al. [18] found in a case–control

study in Germany that smoking has significant effects on

promotion and malignant conversion and no effects on

initiation. More recently, Schollnberger et al. [19] found

similar patterns in a large cohort study carried out in 10

European countries. Interestingly, Schollnberger et al.

reported that a common model described lung cancer

incidence in males and females in the European Prospec-

tive Investigation into Cancer and Nutrition (EPIC). They

concluded that gender differences in lung cancer risk are

due entirely to differences in smoking habits. Hazelton

et al. [10] also found a limited effect of tobacco on the lung

cancer initiation in the CPS-I study, however, no effect on

malignant conversion was seen in that cohort. These dif-

ferences may in part be explained by changes in cigarette

composition, with higher levels of nitrosamines in the

newer cigarettes acting as promoters, while the lower tar

levels may be associated with the lower apparent initiation

rate. Additionally, the smoking information available in the

older cohorts may not have been detailed enough to detect

an effect on malignant conversion.

Incidence among life-long non-smokers

Our analyses indicate that the incidence of lung cancer

among life-long non-smokers is virtually identical in the

two cohorts. The incidence curves predicted by our model

along with observed incidence rates in both cohorts are

shown in the top panels of Fig. 2 and in the left panel of

Fig. 3a.

Incidence among continuing smokers

Figure 3a shows the age-specific incidence curves gener-

ated by the joint model for female and male smokers of 20

and 40 cigarettes per day. The second panels of Fig. 2

show the age-specific incidence rates among continuing

smokers in both cohorts along with the incidence curves

generated by our model.

20 40 60 800

20

40

60

Sm

okin

g R

elat

ive

Ris

k −

NH

S

20−cig40−cig

20 40 60 800

20

40

60

Sm

okin

g R

elat

ive

Ris

k −

HP

FS

20−cig40−cig

20 40 60 800

0.2

0.4

0.6

0.8

1

Age

Qui

tting

Rel

ativ

e R

isk

− N

HS

20 40 60 800

0.2

0.4

0.6

0.8

1

Age

Qui

tting

Rel

ativ

e R

isk

− H

PF

S

20−cig40−cig20−cig40−cig

20−cig40−cig20−cig40−cig

(b)

Relativerisk of smoking and quitting.

40 50 60 70 800

20

40

60

80

100

120

Age

Rat

es p

er 1

00,0

00

Never Smokers

NHSHPFS95 % CI

40 50 60 70 800

500

1000

1500

2000

2500

3000

Age

Smokers

NHS 20−cigHPFS 20−cigNHS 40−cigHPFS 40−cig

(a)

Age-specific incidence rates of lung cancer.

Fig. 3 Age-specific lung cancer incidence rates and relative risk of

smoking and quitting. Maximum likelihood lung cancer incidence

from joint model among 20 and 40 cigarette smokers in the NHS and

HPFS. Smoking starts at age 20. Former smokers quit at ages 30 or

50. (a) Predicted lung cancer incidence from joint model for never

and current smokers in the NHS and HPFS. Please note the different

scales in the panels. (b) Relative risk among smokers (current/never

smoker) and among ex-smokers (former/current smoker) for 20 and

40 cigarettes

Cancer Causes Control (2008) 19:317–328 323

123

The relative hazard associated with smoking 20 and 40

cigarettes per day in each cohort is shown in the top panels

of Fig. 3b. It is clear from this figure that the relative risks

associated with smoking are strongly modified by duration

of smoking. That this observation is not an artifact of our

model can be seen from the directly computed rate ratios in

the Cancer Prevention Study I (Burns et al. [20] , Table 11),

which show a similar concave-down picture not only

for lung cancer but also for other causes of mortality

associated with cigarette smoking. The initial increase in

RR with duration of smoking can be directly attributed to

the strong influence of tobacco on promotion. The later

decline can be attributed to the strong increase in non-

smoker incidence rates of lung cancer with age with a

concomitant leveling off of the incidence rates among

smokers predicted by the model. The strong modification

of RR by duration of smoking suggests that the propor-

tional hazards model may not be the appropriate tool for

analyses of these data.

A common model for lung cancer incidence in the NHS

and HPFS (identical model parameters) is rejected by the

likelihood-ratio test (see Joint Model section). The best

fitting model indicates that smoking-induced promotion is

somewhat higher among females, whereas smoking-

induced malignant conversion is somewhat lower. As a

result of these opposing effects on smoking-induced lung

cancer risk, the incidence curves are rather similar as

shown in the second panels of Fig. 2 and in the right panel

of Fig. 3a. The evidence of a larger effect of smoking on

promotion among females is consistent with a synergistic

effect with estrogens [2, 6], and with effects of gastrin-

releasing peptide (GRP) expression in females [21]. GRP

stimulates cell proliferation in tumors [22] and appears to

be expressed more frequently in female than in male non-

smokers and activated earlier in women in response to

tobacco exposure than in men [21]. The hazard among

females relative to that among males for smokers of 20 and

40 cigarettes per day is shown in Fig. 4. This figure shows

that relative risk increase gradually with duration of

smoking, but the confidence bands generated by MCMC

methods include 1. For smokers of 20 cigarettes per day,

the ratio of age-adjusted female to male rates is 1.1 (95%

CI = 0.77–1.29)4 and is not statistically significant, a

finding that is consistent with that reported in Bain et al.

(2004). For smokers of 40 cigarettes per day, the ratio of

age-adjusted female to male rates is 1.2 (95% CI = 0.80–

1.64).

Incidence among ex-smokers

The bottom panels of Fig. 2 show the incidence rate among

ex-smokers as a function of time since quitting. The model

30 40 50 60 70 800

0.5

1

1.5

2

2.5

Wom

en/M

en H

azar

d R

atio

20−cig Smokers

Model

95% CI

30 40 50 60 70 800

0.5

1

1.5

2

2.5

Wom

en/M

en H

azar

d R

atio

Age

40−cig Smokers

50 60 70 800

0.5

1

1.5

2

2.520−cig Ex−smokers, quit @ 50

50 60 70 800

0.5

1

1.5

2

2.5

Age

40−cig Ex−smokers, quit @ 50

Fig. 4 Women/Men hazard

ratio for current and former

smokers. Solid line Maximum

likelihood hazard ratio from

independent fits to the NHS and

HPFS. Dashed lines, MCMC

95% CI. Left panels. Women/

Men hazard ratio for smokers.

Right panels. Women/Men

hazard ratio for ex-smokers

(quit at age 50). Smoking in all

panels starts at age 20

4 Age-adjusted to the 1990 US total population. Please see Materials

and Methods for details.

324 Cancer Causes Control (2008) 19:317–328

123

predictions describe the data well in both cohorts except for

the first few years after quitting. We attribute this dis-

crepancy in the first few years to quitters who stopped

smoking because they had developed symptoms of lung

cancer. This phenomenon is well known [23, 24]. The effect

of smoking on the rate of malignant conversion implies a

rather quick decrease in risk after quitting, and the effect on

the rate of promotion implies a continuing decrease in risk

over a prolonged period of time as seen in previous analyses

of mortality data (Hazelton et al. [10]). Bottom panels of

Fig. 3b show the decrease in lung cancer incidence among

ex-smokers relative to that among continuing smokers. The

pattern of decrease in both cohorts is consistent with

that reported for mortality by Hazelton et al. [10], by Peto

et al. [25] and by Rachet et al. [16].

Figure 4 shows the female to male hazard ratio for

ex-smokers. This ratio is higher than the ratio of hazards

for continuing smokers (left panels of the figure). The

hazard ratio quickly increases to about 1.5 and remains

approximately constant. It is important to mention that

these calculations also depend on the assumed age at start

(age 20) and age at quitting (age 50). The confidence

bounds on the ratio indicate that it is border-line significant

consistent with the report by Bain et al. [14]. For

ex-smokers of 20 cigarettes per day, the ratio of age-adjusted

female to male rates is 1.35 (95% CI = 0.99–1.56).

For ex-smokers of 40 cigarettes per day, the ratio of age-

adjusted female to male rates is 1.48 (95% CI = 1.08–

1.90).

The estimated benefits of smoking cessation depend

largely on the available information at older ages, where

longer durations of both abstinence and smoking are

observed. The age-distribution of individuals differs

between the two cohorts, with a larger proportion of older

individuals present in the HPFS. Therefore, it is plausible

that the lower risk among the ex-smokers in the HPFS

predicted by the model is attributable, at least in part, to the

difference in age-distribution.

Ten-year risk predictions

A 10-year risk predictions with 95% CIs are shown in

Table 3 for different smoking patterns among continuing

smokers and for former-smokers who quit at the beginning

of the 10-year risk-projection period. These calculations

may overestimate the 10-year risk of lung cancer incidence

for heavy smokers, because population-based annual life

tables [12] were used to adjust for competing risk (No life

tables for different smoking levels were available). The

calculation of risk for smokers who quit at the beginning of

the 10-year interval was made by assuming that the

Table 3 The 10-year risk projections for smokers who smoke for 25, 40, or 50 years and continue to smoke or quit at ages 55, 65, or 75 years

based on models for White male and female smokers in the NHS and HPFS cohorts [% risk(95% CI)]

25 years 40 years 50 years

Quit Still smoking Quit Still smoking Quit Still smoking

NHS: 20-cig smokers

55 0.8 (0.6–1.2) 1.7 (0.9–3.1) 2.0 (1.6–2.7) 3.8 (2.3–6.5) * *

65 2.0 (1.5–2.7) 3.8 (2.0–6.6) 4.6 (3.7–5.7) 7.9 (4.8–12.0) 6.7 (5.6–8.0) 10.7 (7.1–15.1)

75 4.0 (2.9–5.2) 6.8 (3.7–10.6) 7.8 (6.3–9.5) 12.0 (7.7–16.7) 10.3 (8.5–12.6) 14.9 (10.2–20.1)

NHS: 40-cig smokers

55 1.8 (1.3–2.7) 4.2 (2.0–7.6) 5.4 (4.0–7.0) 10.4 (6.2–15.7) * *

65 4.1 (2.9–5.7) 8.4 (4.5–13.8) 10.0 (8.3–12.15) 16.9 (11.3–23.0) 12.3 (9.6–15.7) 18.8 (13.2–25.6)

75 7.3 (5.4–9.4) 12.7 (7.4–18.7) 13.7 (11.2–16.9) 20.4 (14.4–27.3) 14.9 (10.6–20.7) 20.8 (14.7–29.1)

HPFS: 20-cig smokers

55 0.7 (5.2–1.1) 1.8 (0.8–3.1) 1.5 (1.0–2.0) 3.4 (1.6–5.4) * *

65 1.7 (1.2–2.3) 3.8 (1.8–5.9) 3.2 (2.4–4.1) 6.6 (3.6–9.4) 4.3 (3.5–5.3) 8.3 (5.3–11.2)

75 3.1 (2.3–4.2) 5.9 (3.2–8.6) 5.2 (4.2–6.4) 9.0 (6.1–12.2) 6.5 (5.4–7.8) 10.7 (7.9–14.1)

HPFS: 40-cig smokers

55 1.4 (0.8–2.2) 3.8 (1.4–6.4) 3.4 (2.1–4.7) 7.9 (3.5–11.5) * *

65 3.1 (1.9–4.2) 7.1 (3.2–10.5) 6.3 (4.5–7.8) 12.4 (7.2–16.8) 7.8 (6.1–9.6) 13.9 (10.0–18.3)

75 5.1 (3.5–6.6) 9.6 (5.3–13.6) 8.5 (6.9–10.5) 14.2 (10.3–18.7) 9.5 (6.4–12.2) 14.7 (10.3–19.6)

Note: Life tables are used to adjust for death from competing causes. Model-based 10-year risks are shown for each smoking pattern, with

MCMC 95% CIs. Projections for individuals who quit smoking and continue to abstain for the following 10 years assume the model variables

revert to background values following smoking cessation. Asterisks are placed in cells that correspond to unrealistically early ages for starting

smoking. These rates are not generalizable and are probably lower than expected for the general population, as the cohorts are more educated and

healthy

Cancer Causes Control (2008) 19:317–328 325

123

dose–response functions return to background levels when

smoking stops. These estimates show the benefit of quitting

for any dose and duration of smoking. Risk estimates are

somewhat higher in the NHS than in the HPFS. These

estimates are consistent with 10-year risk projections based

on data from the Carotene Retinol Efficacy Trial (Table 2

of Bach et al. [26]). Estimates in Table 3 are higher than

10-year projections of lung cancer mortality risk based on

the CPS-I and CPS-II cohorts (Table 3 in Hazelton et al.

[10]).

Conclusions

We conclude that the risk of lung cancer is similar among

non-smoking and smoking men in the HPFS and women in

the NHS, but that the lung cancer risk among ex-smokers is

higher in the NHS. Within the framework of the TSCE

model, this difference can be attributed to higher smoking-

related promotion in the NHS cohort. However, it is

plausible that this is just an artifact produced by the dif-

ference in age-distribution between the two cohorts. In both

cohorts, we find that the main effect of cigarette smoke is

on the promotion of premalignant lesions. This is consis-

tent with previous analyses of several cohort and case–

control studies using the TSCE model [9, 10, 17–19]. The

relative risk of smoking is strongly dependent on duration

of smoking. For a smoker who begins to smoke before the

age of 20, the RR increases to about age 70 and declines

thereafter. This pattern is consistent with that observed in

other studies [10, 20]. Among ex-smokers, the relative risk

of former versus current smokers appears to decrease more

strongly at higher smoking levels. This finding is consistent

with the analysis of CPS-I, CPS-II and the British Doctors

cohorts in Hazelton et al. [10] and with the analysis of a

large case–control study in Rachet et al. [16].

Acknowledgments We thank the Cancer Intervention and Surveil-

lance Modeling Network (CISNET) Group, Dr. Anup Dewanji and

Dr. Jihyoun Jeon for useful suggestions. We acknowledge support

from the NIH grants RO1 CA047658 and UO1 CA97415. Financial

support: NIH grants RO1 CA047658 and UO1 CA97415.

Appendix

In this section we show how to calculate explicitly the

individual likelihoods (Eq. 2) as function of the model

parameters and the individual’s smoking history. For sim-

plicity, we assume a constant lag time between the

appearance of the first malignant cell and diagnosis, so the

survival at any age t, S(t), is equal to the TSCE model

survival at age t - tlag, S2(t - tlag) (see Eq. 4). We use the

expressions derived in Heidenreich et al. [11] for the TSCE

model survival in case of piecewise-constant parameters.

From Heidenreich et al. [11], we have that the survival

function and its derivative at any age t are

SðtÞ ¼ S2ðt � tlagÞ ¼ expXk

j¼1

l0;jX

ajln

qj � pj

fjðtj�1; tkÞ

� �( )

;

ð5Þ

S0ðtÞ ¼S02ðt � tlagÞ

¼S2ðt � tlagÞ �Xk

j¼1

�l0;jX

aj

o

otkln ðfjðtj�1; tkÞÞ;

ð6Þ

where k is the number of age-periods with different

smoking-dose before age tk: t - tlag, [tj-1,tj] denote the

end-points of the j - th age-period, dj is the smoking-dose

during the j - th age-period, l0,j, aj, gj, l1, j denote the

parameter values during the j - th age-period and

gj ¼ gð1þ gcdge

j Þ; aj ¼ að1þ gcdge

j Þ; ð7Þ

l1;j ¼ l1ð1þ l1cdl1ej Þ; l0; j ¼ l0ð1þ l0cd

l0ej Þ; ð8Þ

pj; qj ¼1

2�gj �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

g2j þ 4ajl1; j

q� �ð9Þ

~yk ¼ 0;

~yj�1 ¼aj�1

aj

ð~yj � pjÞqjeqjðtj�1�tjÞ þ ðqj � ~yjÞpje

pjðtj�1�tjÞ

fjðtj�1; tkÞ;

ð10Þ

fjðtj�1; tkÞ ¼ ð~yj � pjÞ exp qjðtj�1 � tjÞ� �

þ ðqj � ~yjÞ exp pjðtj�1 � tjÞ� �

; ð11Þ

o

otkfkðtk�1; tkÞ ¼ ½exp qkðtk�1 � tkÞf g

� exp pkðtk�1 � tkÞf g�pkqk; ð12Þ

o

otkfjðtj�1; tkÞ ¼ ½exp qjðtj�1 � tjÞ

� �

� exp pjðtj�1 � tjÞ� �

� o

otk~yj; ð13Þ

o

otk~yk�1 ¼

ak�1

ak

ðqk � pkÞ2eðpkþqkÞðtk�1�tkÞ

ðfkðtk�1; tkÞÞ2pkqk; ð14Þ

o

otk~yj�1 ¼

aj�1

aj

ðqj � pjÞ2eðpjþqjÞðtj�1�tjÞ

ðfjðtj�1; tkÞÞ2o

otk~yj: ð15Þ

To calculate the individual likelihood (Eq. 2), we only

need to evaluate expressions (5) and (6) at the age at entry

(t = ae) and at the age at last (t = al) accordingly.

As an example, we calculate the likelihood contribution

of the individual whose smoking history is illustrated in the

top panel of Fig. 1 at the maximum likelihood estimates of

the joint-model (Table 2). The individual entered the NHS

study at age 43 (approx) and died at age 63 from other

causes. Hence, we must calculate the TSCE model survival

326 Cancer Causes Control (2008) 19:317–328

123

at ages 43 - tlag = 38 and 63 - tlag = 58 to obtain her

individual likelihood (Eq. 2). This individual started

smoking at age 20, at a constant rate of 30 cigarettes per

day, and continued with this pattern until age 42. Hence, to

calculate the TSCE model survival at age 38, we have to

consider two different age-periods, namely [0, 20]:[t0, t1]

and [20, 38]:[t1, t2]. From expression (5), we have that the

survival at age 43 is

Sð43Þ ¼ S2ð38Þ ¼ expX2

j¼1

l0;jX

ajln

qj � pj

fjðtj�1; 38Þ

( )

ð16Þ

¼ expl0;1X

a1

lnq1 � p1

f1ð0; 38Þ þl0;2X

a2

lnq2 � p2

f2ð20; 38Þ

ð17Þ

In this particular example, the parameter values during

the first age-period are equal to the background parameters

(i.e., d1 = 0). In contrast, during the second period, we

have to multiply the promotion rate g and the malignant

conversion rate l1 by their corresponding dose–responses

(with d2 = 30). We also multiply the initiated cells’

division rate, a, by the dose–response of g in order to

keep them consistent. In this model, there is no dose–

response on the initiation rate, l0, so it is constant between

all age-periods. Using (Eqs. 5, 7–15) and the parameter

estimates from Table 2, we obtain that

ln½S2ð38Þ�¼ð8:14�10�8Þð107Þ3

ln2:55�10�6�ð�0:0960Þ

0:0959

þð8:14�10�8Þð107Þ5:54

ln5:18�10�6�ð�0:1765Þ

0:1766¼�0:00116:

ð18Þ

So the survival at age 43 is S(43) = S2(38) = 0.9988. To

calculate the TSCE model survival at age 58, we need to

consider the following age-periods: ½0;20��½t0;t1�;½20;42��½t1;t2�; ½42;44� � ½t2;t3�;½44;46� � ½t3;t4�; ½46;48� � ½t4;t5�;½48;

55��½t5;t6� and [55, 57]:[t6, t7], [57, 58]:[t7, t8]. At each

age-period, we must adjust the model parameters by the

corresponding dose–response. Similarly as before, we

calculate the survival at age 63 using (Eqs. 5, 7–15)

Sð63Þ ¼ S2ð58Þ ¼ expX8

j¼1

l0;jX

ajln

qj � pj

fjðtj�1; 58Þ

( )

¼ 0:9768:

ð19Þ

So the likelihood contribution of this individual is equal to

Lið63; 43; �hðdiÞÞ ¼Sð63ÞSð43Þ ¼

0:9768

0:9988¼ 0:9779: ð20Þ

As a final example, let’s assume that an individual with

the same smoking history, is diagnosed with lung cancer at

age 63. In this case, we must obtain S0(63) = S02(58) to

calculate its individual likelihood (Eq. 2). Using (Eqs. 5–15)

we have that

S0ð63Þ ¼ S02ð58Þ ¼S2ð58Þ �X8

j¼1

�l0;jX

aj

o

otkln ðfjðtj�1; 58ÞÞ

¼ � 3:1171� 10�3:Þ ð21Þ

So the likelihood contribution of this individual is equal to

Lið63; 43; �hðdiÞÞ ¼ �S0ð63ÞSð43Þ ¼

3:1171� 10�3

0:9988

¼ 3:1208� 10�3: ð22Þ

Fortran and R code to compute the TSCE hazard and

survival functions in case of piecewise-constant parameters

are available from the authors by request.

References

1. Risch HA, Howe GR, Jain M, Burch JD, Holowaty EJ, Miller AB

(1993) Are female smokers at higher risk for lung cancer than

male smokers? A case–control analysis by histologic type. Am J

Epidemiol 138(5):281–293

2. Zang EA, Wynder EL (1996) Differences in lung cancer risk

between men and women: examination of the evidence. J Natl

Cancer Inst 88(3–4):183–192

3. Perneger TV (2001) Sex, smoking, and cancer: a reappraisal.

J Natl Cancer Inst 93(21):1600–1602

4. Bain C, Feskanich D, Speizer FE, Thun M, Hertzmark E, Rosner

BA, et al (2004) Lung cancer rates in men and women with com-

parable histories of smoking. J Natl Cancer Inst 96(11):826–834

5. Blot WJ, McLaughlin JK (2004) Are women more susceptible to

lung cancer? J Natl Cancer Inst 96(11):812–813

6. Patel JD, Bach PB, Kris MG (2004) Lung cancer in US women: a

contemporary epidemic. JAMA 291(14):1763–1768

7. Moolgavkar SH, Venzon DJ (1979) Two-event models for car-

cinogenesis: incidence curves for childhood and adult tumors.

Math Biosci 47(1):55–77

8. Moolgavkar SH, Knudson AG (1981) Mutation and cancer: a

model for human carcinogenesis. J Natl Cancer Inst 66(6):

1037–1052

9. Hazelton WD, Luebeck EG, Heidenreich WF, Moolgavkar SH

(2001) Analysis of a historical cohort of Chinese tin miners with

arsenic, radon, cigarette smoke, and pipe smoke exposures using

the biologically based two-stage clonal expansion model. Radiat

Res 156(1):78–94

10. Hazelton WD, Clements MS, Moolgavkar SH (2005) Multistage

carcinogenesis and lung cancer mortality in three cohorts. Cancer

Epidemiol Biomarkers Prev 14(5):1171–1181

11. Heidenreich WF, Luebeck EG, Moolgavkar SH (1997) Some

properties of the hazard function of the two-mutation clonal

expansion model. Risk Anal 17(3):391–399

12. US Decennial Life Tables for 1989–91. vol 1. Hyattsville,

Maryland: US Department of Health and Human Services,

National Center for Health and Statistics; 1997

13. Gilks WR, Richardson S, Spiegelhalter DJ (eds) (1996) Markov

chain Monte Carlo in practice. Interdisciplinary Statistics.

Chapman & Hall, London

14. Bain C, Feskanich D, Speizer FE, Thun M, Hertzmark E, Rosner

BA, et al (2004) Erratum. J Natl Cancer Inst 96(23):1796–1797

Cancer Causes Control (2008) 19:317–328 327

123

15. Wakelee HA, Chang ET, Gomez SL, Keegan TH, Feskanich D,

Clarke CA, et al (2007) Lung cancer incidence in never smokers.

J Clin Oncol 25(5):472–478

16. Rachet B, Siemiatycki J, Abrahamowicz M, Leffondre K (2004)

A flexible modeling approach to estimating the component

effects of smoking behavior on lung cancer. J Clin Epidemiol

57(10):1076–1085

17. Luebeck EG, Heidenreich WF, Hazelton WD, Paretzke HG,

Moolgavkar SH (1999) Biologically based analysis of the data for

the Colorado uranium miners cohort: age, dose and dose-rate

effects. Radiat Res 152(4):339–351

18. Heidenreich WF, Wellmann J, Jacob P, Wichmann HE (2002)

Mechanistic modelling in large case–control studies of lung

cancer risk from smoking. Stat Med 21(20):3055–3070

19. Schollnberger H, Manuguerra M, Bijwaard H, Boshuizen H,

Altenburg HP, Rispens SM, et al (2006) Analysis of epidemio-

logical cohort data on smoking effects and lung cancer with a

multi-stage cancer model. Carcinogenesis 27(7):1432–1444

20. Burns DM, Shanks TG, Choi W, Thun MJ, Heath CW, Garfinkel

L (1997) The American Cancer Society Cancer Prevention Study

I: 12-year followup of 1 million men and women. In: Burns DM,

Garfinkel L, Samet JM (eds) Changes in cigarette-related disease

risks and their implications for prevention and control. Smoking

and Tobacco Control, Monograph 8, NIH Publ No 97–4213, pp

113–304

21. Shriver SP, Bourdeau HA, Gubish CT, Tirpak DL, Davis AL,

Luketich JD, et al (2000) Sex-specific expression of gastrin-

releasing peptide receptor: relationship to smoking history and

risk of lung cancer. J Natl Cancer Inst 92(1):24–33

22. Siegfried JM, DeMichele MA, Hunt JD, Davis AG, Vohra KP,

Pilewski JM (1997) Expression of mRNA for gastrin-releasing

peptide receptor by human bronchial epithelial cells. Association

with prolonged tobacco exposure and responsiveness to bombe-

sin-like peptides. Am J Respir Crit Care Med 156(2 Pt 1):

358–366

23. Hauptmann M, Lubin JH, Rosenberg P, Wellmann J, Kreienbrock

L (2000) The use of sliding time windows for the exploratory

analysis of temporal effects of smoking histories on lung cancer

risk. Stat Med 19(16):2185–2194

24. Leffondre K, Abrahamowicz M, Siemiatycki J, Rachet B (2002)

Modeling smoking history: a comparison of different approaches.

Am J Epidemiol 156(9):813–823

25. Peto R, Darby S, Deo H, Silcocks P, Whitley E, Doll R (2000)

Smoking, smoking cessation, and lung cancer in the UK since

1950: combination of national statistics with two case–control

studies. BMJ 321(7257):323–329

26. Bach PB, Kattan MW, Thornquist MD, Kris MG, Tate RC,

Barnett MJ, et al (2003) Variations in lung cancer risk among

smokers. J Natl Cancer Inst 95(6):470–478

328 Cancer Causes Control (2008) 19:317–328

123