New methods based on the adaptive ridge procedure to take ... · New methods based on the adaptive ridge procedure to take into account age, period and cohort e ects Olivier Bouaziz1

New methods based on the adaptive ridge procedure totake into account age, period and cohort effects

Olivier Bouaziz1 and Vivien Goepp1

with Gregory Nuel3 and Jean-Christophe Thalabard1

1MAP5 (CNRS 8145), Universite Paris Descartes, Sorbonne Paris Cite3LPSM (CNRS 8001), Sorbonne Universite, Paris

Seminaire du CepiDc

Olivier Bouaziz and Vivien Goepp The AR procedure for APC models June 18, 2019 1 / 43

1 Background in time to event analysis

2 The adaptive ridge procedure for piecewise constant hazards

3 Bidimensional estimation of the hazard rate

4 Extension of the age-period-cohort model


Outline






Background in time to event analysis

I We study a positive continuous time to event variable T .

I T represents the time difference between event of interest and patiententry.

Study start Patient entry Ev. of interest

T

I Examples : time to relapse of Leukemia patients, time to onset ofcancer, time to death . . .


Background in time to event analysis : right censoring

Study start End of study

T1

T obs2

T2

T obs3

T3

T4

T obs4


The observations, the hazard rate and the likelihood

I Observations :{T obsi = Ti ∧ Ci

∆i = 1Ti≤Ci

I Independent censoring : T ⊥⊥ C

I The hazard rate is defined as :

λ(t) := lim4t→0

P[t ≤ T < t +4t|T ≥ t]

4t

I The likelihood of the observed data is equal to :

n∏i=1

f (T obsi )∆iS(T obs

i )1−∆i =n∏

i=1

λ(T obsi )∆i exp

(−∫ T obs

i

0λ(t)dt

),

where f is the density of T and S(t) = P[T > t].


Outline






The piecewise constant hazard model

I The model :

λ(t) =K∑

k=1

λk1ck−1<t≤ck

I Goal : estimate the λks.

The log-likelihood is equal to :

`n(λ) =K∑

k=1

{Ok log (λk)− λk Rk

},

where

I Ok =∑

i ∆i1ck−1<T obsi ≤ck

: number of observed events in interval

(ck−1, ck ]

I Rk =∑

i (Tobsi ∧ ck − ck−1) : total time at risk in interval (ck−1, ck ]


The piecewise constant hazard model

I Ok : number of observed events in interval (ck−1, ck ]

I Rk : total time at risk in interval (ck−1, ck ]

The maximum likelihood estimator is explicit :

λmlek =

Ok

Rk

I We want to choose the number and location of the cuts from the data

I We start from a large grid of cuts (K = 100, 1 000, . . .)

I We use a penalization technique to constrain adjacent cut values tobe equal.


Penalizing the maximum likelihood estimator

Set log λk = ak . Estimation of a is achieved through penalized

log-likelihood :

`penn (a) = `n(a)︸︷︷︸

log-likelihood

− pen

2

{K−1∑k=1

wk (ak+1 − ak)2

}︸︷︷︸

regularization term

,

I w represents a weight

I pen is a penalty term


Penalizing the maximum likelihood estimator

Set log λk = ak . Estimation of a is achieved through penalized

log-likelihood :

`penn (a) = `n(a)︸︷︷︸

log-likelihood

− pen

2

{K−1∑k=1

wk (ak+1 − ak)2

}︸︷︷︸

regularization term

,

I w represents a weight

I pen is a penalty term


Two types of regularization

1. L2 regularization (Ridge) with w = 1

2. L0 regularization with the adaptive ridge procedure.Iterative updates of the weights :

wk =((

ak+1 − ak)2

+ ε2)−1

,

with ε� 1.

F. Frommlet and G. Nuel, An Adaptive Ridge Procedure for L0Regularization. PlosOne (2016).


L0 norm approximation

When ε� 1 :

wk (ak+1 − ak)2 ' ‖ak+1 − ak‖20 =

{0 if ak+1 = ak

1 if ak+1 6= ak

●ε

0.00

0.25

0.50

0.75

1.00

−5.0e−06 −2.5e−06 0.0e+00 2.5e−06 5.0e−06

x

Légende●

●

||x||02

x2

x2 + ε2


Maximization of the penalized log-likelihood

I The penalized estimator is no longer explicit

I Maximization is performed from the Newton-Raphson algorithm. Fora given sequence of weights w , the mth Newton Raphson iterationstep is obtained from the equation

a(m) = a

(m−1) + I (a(m−1),w)−1U(a(m−1),w),

where I is the opposite of the Hessian matrix, U is the score vector.

I The Hessian matrix is tri-diagonal

I =⇒ computation time for the inversion of the Hessian is O(K )


The Adaptive Ridge procedure for a given penalty

procedure Adaptive-Ridge(O,R, pen)(a,w , sel) ← (0,1,0)while not converge do

anew ← Newton-Raphson(O,R, pen, a,w)

wnewk ←

((anewk+1 − anew

k

)2+ ε2

)−1

selnewk ← wnew

k

(anewk+1 − anew

k

)2

(a,w ,sel) ← (anew,wnew,selnew)end while

Compute (Osel,Rsel)exp(amle) ← O

sel/Rsel

return amle

end procedure


The Adaptive Ridge procedure for a given penalty

procedure Adaptive-Ridge(O,R, pen)(a,w , sel) ← (0,1,0)while not converge do

anew ← Newton-Raphson(O,R, pen, a,w)

wnewk ←

((anewk+1 − anew

k

)2+ ε2

)−1

selnewk ← wnew

k

(anewk+1 − anew

k

)2

(a,w ,sel) ← (anew,wnew,selnew)end whileCompute (Osel,Rsel)exp(amle) ← O

sel/Rsel

return amle

end procedure


Comparison of the two regularization methods

pen = 0 =⇒ a = amle

pen =∞ =⇒ a = constant

1e−01 1e+01 1e+03 1e+05

−8

−7

−6

−5

−4

−3

−2

Penalty

Pa

ram

ete

r va

lue

(o

n t

he

lo

g−

sca

le)

L2 regularization

1e−01 1e+01 1e+03 1e+05

−8

−7

−6

−5

−4

−3

−2

Penalty

Pa

ram

ete

r va

lue

(o

n t

he

lo

g−

sca

le)

L0 regularization


Comparison of the two regularization methods

pen = 0 =⇒ a = amle

pen =∞ =⇒ a = constant

1e−01 1e+01 1e+03 1e+05

−8

−7

−6

−5

−4

−3

−2

Penalty

Pa

ram

ete

r va

lue

(o

n t

he

lo

g−

sca

le)

L2 regularization

1e−01 1e+01 1e+03 1e+05

−8

−7

−6

−5

−4

−3

−2

Penalty

Pa

ram

ete

r va

lue

(o

n t

he

lo

g−

sca

le)

L0 regularization


Model selection for the Adaptive Ridge estimator (n = 400)

0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard

I In red the true hazard function

I In black the hazard estimator for pen = 0.1



0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard





0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard





0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard





0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard





0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard





0 20 40 60 80

0.00

0.02

0.04

0.06

0.08

0.10

Time

Haz

ard




Model selection for the Adaptive Ridge estimator

Three different methods to perform model selection :

1. BIC(m) = −2`n(amlem ) + m log n

2. AIC(m) = −2`n(amlem ) + 2m

3. K-fold Cross Validation (CV),

with m the dimension of the model :

m =K−1∑k=0

1{amlek+1,m − amle

k,m 6= 0}.


Model selection for the Adaptive Ridge estimator using theBIC (n = 400)

0.1 0.5 1.0 5.0 10.0 50.0 100.0

−6

−5

−4

−3

−2

Penalty

Pa

ram

ete

r va

lue

(o

n th

e lo

g−

sca

le)

Regularization path

0 20 40 60 80

0.0

00

.01

0.0

20

.03

0.0

40

.05

Time

Ha

za

rd

Hazard estimator (in black)


Outline






The Lexis diagram

period

age

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

10

20

30

40

50

60

70

80

90

100

◦

×

◦

×

cohort

period

age

Age-Period Lexis diagram

cohort

age

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

10

20

30

40

50

60

70

80

90

100

◦

×

◦

×

cohort

period

age

Age-Cohort Lexis diagram

Key relation : cohort+age=period


The Lexis diagram

period

age

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

10

20

30

40

50

60

70

80

90

100

◦

×

◦

×

cohort

period

age

Age-Period Lexis diagram

cohort

age

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

10

20

30

40

50

60

70

80

90

100

◦

×

◦

×

cohort

period

age

Age-Cohort Lexis diagram

Key relation : cohort+age=period


The SEER data

I Huge american registry dataset of breast cancerhttps ://seer.cancer.gov

I Primary, unilateral, malignant and invasive cancers

I 1.2 million of patients, 60% of censoring

I The cancer diagnosis range from 1973 to 2014

I The time from cancer diagnosis to death or censoring ranges from 0to 41 years.

I The variable of interest is the time from cancer diagnosis until death.

Aim : estimate the hazard of death as a function of both date of cancerdiagnosis and time since diagnosis.

I We use the adaptive ridge procedure

I Penalization over the two directions.

V. Goepp, J-C. Thalabard, G. Nuel and O. Bouaziz. RegularizedBidimensional Estimation of the Hazard Rate. Submitted

.Olivier Bouaziz and Vivien Goepp The AR procedure for APC models June 18, 2019 21 / 43

The SEER data

I Huge american registry dataset of breast cancerhttps ://seer.cancer.gov

I Primary, unilateral, malignant and invasive cancers

I 1.2 million of patients, 60% of censoring

I The cancer diagnosis range from 1973 to 2014

I The time from cancer diagnosis to death or censoring ranges from 0to 41 years.

I The variable of interest is the time from cancer diagnosis until death.

Aim : estimate the hazard of death as a function of both date of cancerdiagnosis and time since diagnosis.

I We use the adaptive ridge procedure

I Penalization over the two directions.

V. Goepp, J-C. Thalabard, G. Nuel and O. Bouaziz. RegularizedBidimensional Estimation of the Hazard Rate. Submitted.


The SEER data

I λj ,k : true hazard in rectangle (j , k)

I Oj ,k : number of observed events in rectangle (j , k)

I Rj ,k : total time at risk in rectangle (j , k)


`n(λ) =J∑

j=1

K∑k=1

{Oj ,k log (λj ,k)− λj ,kRj ,k}

Set log λj ,k = ηj ,k . Estimation of η through penalized log-likelihood :

`penn (η) = `n(η)︸︷︷︸

log-likelihood

− pen

2

∑j ,k

{vj ,k (ηj+1,k − ηj ,k)2 + wj ,k (ηj ,k+1 − ηj ,k)2

}︸︷︷︸

regularization term

.


The SEER data

I λj ,k : true hazard in rectangle (j , k)

I Oj ,k : number of observed events in rectangle (j , k)

I Rj ,k : total time at risk in rectangle (j , k)


`n(λ) =J∑

j=1

K∑k=1

{Oj ,k log (λj ,k)− λj ,kRj ,k}

Set log λj ,k = ηj ,k . Estimation of η through penalized log-likelihood :

`penn (η) = `n(η)︸︷︷︸

log-likelihood

− pen

2

∑j ,k

{vj ,k (ηj+1,k − ηj ,k)2 + wj ,k (ηj ,k+1 − ηj ,k)2

}︸︷︷︸

regularization term

.


The SEER datastage1 stage2 stage3

ag

e <

45

45

< a

ge

< 5

5a

ge

> 5

5

1990 1995 2000 2005 1990 1995 2000 2005 1990 1995 2000 2005

0

10

20

0

10

20

0

10

20

Date of diagnostic

Su

rviv

al tim

e a

fte

r d

iag

no

stic (

ye

ars

)

0.05

0.10

0.15

0.20

Mortality


Outline






RecapAge-period-cohort analysis

period

age

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

10

20

30

40

50

60

70

80

90

100

×

×

◦

×

◦

×

cohort

period

age

Lexis diagram : Age-Period

cohort

age

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

10

20

30

40

50

60

70

80

90

100

◦

×

◦

×

◦

×

cohort

period

age

Lexis Diagram : Age-Cohort


Age-period-cohort analysis

We want to infer the effect of age, period, and cohort.

I age effect : menopause

I cohort effect : carcinogenic baby food

I period effect : nuclear accident

We define one parameter vector for each effect : α, β et γ


Existing models

1. In the age-period-cohort model, we assume

log λj ,k = αj + βk + γj+k−1.

I Non-identifiable : we can eitherI infer ∆2α, ∆2β et ∆2γ.I add a constraint to the model.

2. In the age-cohort model, we assume

log λj ,k = αj + βk .

I J + K − 1 parameters for JK variables → regularizing

I Additive effect of the variables : strong a priori

B. Carstensen, Age–period–cohort models for the Lexis diagram, Statistics inmedicine, 2007.


Our approach : model the effects and their interactions

I The age-cohort and age-period-cohort models do not inferinteractions between effects.

I We introduce an age-cohort-interaction model :

log (λj ,k) = µ+ αj + βk + δj ,k ,

where δj ,k is the interaction (with δ1,k = δj ,1 = 0).

I We regularize over the differences of δj ,k .


Estimation in the ACI model

I The model parameter is θ = (µ,α,β, δ).

I Estimation using the adaptive ridge :

`penn (θ) = `n(θ)−pen

2

∑j ,k

{vj ,k (δj+1,k − δj ,k)2+wj ,k (δj ,k+1 − δj ,k)2

}.

I With adaptive ridge procedure over the interaction term δ.


Adaptive ridge procedureprocedure Adaptive-Ridge(O,R, pen)

θ ← 0v ← 1w ← 1while not converge do

θnew ← Newton-Raphson(O,R, pen, v ,w)

vnewj,k ←

((δnewj+1,k − δnew

j,k

)2

+ ε2

)−1

wnewj,k ←

((δnewj,k − δnew

j,k−1

)2

+ ε2

)−1

θ ← θnew

v ← vnew

w ← wnew

end whileCompute (Osel,Rsel) from (θnew, vnew,wnew)θmle ← O

sel/Rsel

return θmle

end procedure



θ ← 0v ← 1w ← 1

while not converge doθnew ← Newton-Raphson(O,R, pen, v ,w)

vnewj,k ←


j,k

)2

+ ε2

)−1

wnewj,k ←


j,k−1

)2

+ ε2

)−1

θ ← θnew

v ← vnew

w ← wnew


sel/Rsel

return θmle

end procedure





vnewj,k ←


j,k

)2

+ ε2

)−1

wnewj,k ←


j,k−1

)2

+ ε2

)−1

θ ← θnew

v ← vnew

w ← wnew

end while

Compute (Osel,Rsel) from (θnew, vnew,wnew)θmle ← O

sel/Rsel

return θmle

end procedure





vnewj,k ←


j,k

)2

+ ε2

)−1

wnewj,k ←


j,k−1

)2

+ ε2

)−1

θ ← θnew

v ← vnew

w ← wnew


sel/Rsel

return θmle

end procedure


Adaptive ridge procedure

cohort

age

j = 1

j = 2

j = 3

k = 1 k = 2 k = 3

cohort

age

j = 1

j = 2

j = 3

k = 1 k = 2 k = 3

cohort

age

j = 1

j = 2

j = 3

k = 1 k = 2 k = 3

2

1

1

2

1

1

2

2

3

(a) Representation of

vj ,k (δj+1,k − δj ,k)2

et wj ,k (δj ,k+1 − δj ,k)2(b) Corresponding graph

(c) Segmentation intoconnected components


Illustration : simulated dataSimulation setting

I J = 20 age intervals and K = 20 cohort intervals

I Sample the cohort uniformly

I Sample the age using the hazard rate (λj ,k)

I Uniform censoring over the age [75, 100]

I Infer (µ,α,β, δ) in the ACI model.

I We represent medians over 100 repetitions


Illustration : simulated dataSimulation design 1

cohort

1920

1940

1960

1980

age

20

40

60

80

hazard

0.02

0.04

0.06

0.08

True hazard

Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

0.05

0.10

Hazard MLE


Results with the ACI modelSimulation design 1

cohort

1920

1940

1960

1980

age

20

40

60

80

hazard

0.02

0.04

0.06

0.08

Estimated hazard λj,k

Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

1.0

1.2

1.4

1.6

1.8

Estimated interaction δj,k



Blue : true valuesRed : median estimate over 100 repetitions

0.0

0.5

1.0

1.5

25 50 75 100

Age

Effe

ct

Estimated age effect αj

−0.25

0.00

0.25

0.50

1925 1950 1975 2000

Cohort

Effe

ct

Estimated cohort effect βk



Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

0.01

0.02

0.03

0.04

0.05

True hazard

Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

0.00

0.01

0.02

0.03

0.04

0.05

Hazard MLE estimate



Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

0.02

0.03

0.04

0.05


Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

1.0

1.1

1.2

1.3

1.4




Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

0.01

0.02

0.03

0.04

0.05

True hazard

Cohort

1920

1940

1960

1980

Age

20

40

60

80

Hazard

0.01

0.02

0.03

0.04

0.05

0.06

Age-cohort model



cohort

1920

1940

1960

1980

age

20

40

60

80

hazard

0.02

0.03

0.04

0.05


cohort

1920

1940

1960

1980age

20

40

60

80

hazard

1.0

1.5

2.0



Conclusion and perspectives

Conclusion :

I Extends the age-cohort model

I More general than the age-period-cohort model

Perspectives :

I Bootstrapping to reduce sensitivity to outliers

I Application : Incidence of breast cancer in Norway (NOWAC cohort)

More info :

I R package : github.com/goepp/hazreg

I Website : www.math-info.univ-paris5.fr/~obouaziz andgoepp.github.io


Merci de votre attention


Bonus : Model selection for Adaptive RidgeBayesian criteria

I Problem : choose between M models M1, . . . ,MM of dimensionsq1, . . . , qM .

I Solution : maximize P(Mm|R,O) ∝ P(R,O|Mm)π(Mm).

I By approximation :

−2 log (P(Mm|R,O)) = 2`n(ηm) + qm log n−2 log π(Mm) +OP(1)

I We must choose the prior a priori π (Mm)


Model selection for Adaptive Ridge

BIC : π (Mm) = 1All the Mm are equiprobable

EBIC0 : P(Mm ∈M[qm]

)= 1

All the M[qm] are equiprobable

× ×××

×

××××

×××

×

Model sets M[1] M[2] M[qm] M[JK−1]M[JK ]

M[qm] is the set of models with qm parameters

J. Chen and Z. Chen, Extended Bayesian information criteria for model selection withlarge model spaces, Biometrika, 2008.


Documents

New methods based on the adaptive ridge procedure to take ... · New methods based on the adaptive ridge procedure to take into account age, period and cohort e ects Olivier Bouaziz1