24
Test for informative cluster size with survival data Alessandra Meddis 1 , A. Latouche 1,2 1.Institut Curie, U900, F-92210, Saint Cloud 2.Conservatoire National des Arts et M´ etiers,Paris GDR Statistique et Sant´ e October 11th

Test for informative cluster size with survival datagdr-stat-sante.math.cnrs.fr/spip/IMG/pdf/alessandra... · 2019. 11. 18. · Test for informative cluster size with survival data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Test for informative cluster size with survival

    data

    Alessandra Meddis1, A. Latouche1,2

    1.Institut Curie, U900, F-92210, Saint Cloud2.Conservatoire National des Arts et Métiers,Paris

    GDR Statistique et Santé

    October 11th

  • Outline

    Context and motivation

    Notations and definitions

    Test statistic and its distribution

    Perspectives

    Alessandra MEDDIS (Institute Curie) October 11th 1 / 22

  • Motivation

    Clustered survival data :I observations contributed by the same cluster (eg individual,

    center) tend to be dependent, while those from differentclusters are independent.

    General methodologies consider the cluster size to be a fixeddesign. However, in some scenarios the cluster size can beinformative for inference→ Informative Cluster Size (ICS)

    Alessandra MEDDIS (Institute Curie) October 11th 2 / 22

  • Motivating example

    French patients with hepatocellular carcinoma1:I 538 patients

    F cirrhosisF hepatitis B/C

    I 90 different institutionsF different sample sizes (5-55)F patients in bigger hospitals have better prognosis

    I aim of the study: compare three scores for predicting survival

    Our goal is to investigate on Informative Cluster Size (ICS):when the outcome depends on the cluster size conditionally on aset of covariates.

    1S.Collette & all. Prognosis of advanced hepatocelluar carcinoma:comparisonof three staging system in two French clinical trials. Annals of Oncology (2008)

    Alessandra MEDDIS (Institute Curie) October 11th 3 / 22

  • Example data with ICS

    We can provide some typical studies where the cluster size canbe informative:

    I Dental data: the probability for a teeth to fall in one individual(cluster) is linked to the number of tooth (cluster sizes) of thesame.

    I Metastatic cancer data: several metastasis sites are explored .Sites from same individual are correlated and the number ofmetastatic site has an impact on the response to treatment.

    I Meta-analysis: pooling data from different trials with differentsample sizes.

    ♣ For example 1 and 2 we would expect ICS because of the structureof the data, while for example 3 we would assume non informativecluster size.

    Alessandra MEDDIS (Institute Curie) October 11th 4 / 22

  • Motivating example: ad-hoc analysis for ICSKaplan-Meier estimator of the survival function at t∗ = 6 months for eachcluster in order to study the relationship between the cluster sample sizesand the outcome.

    Alessandra MEDDIS (Institute Curie) October 11th 5 / 22

  • Ad-hoc analysis with NICSExample where non informative cluster size is suggested:

    I IMENEO2 meta-analysis for non metastatic breast cancerI 16 centersI correlation between failure times was detected

    ●●

    ●●

    ● ●

    Cluster sample size (logarithmic scale, but actual values are displayed)

    Sur

    viva

    l pro

    babi

    lity

    (%)

    9 16 25 45 61 102 142 286 602

    70

    80

    90

    100

    2Bidard F, Michiels S, Riethdorf S, et al. Circulating tumor cells in breastcancer patients treated by neoadjuvant chemotherapy: a meta-analysis JNCI:Journal of the National Cancer Institute 2018; 110(6): 560–567.6:

    Alessandra MEDDIS (Institute Curie) October 11th 6 / 22

  • Formalism

    (V1,V2, ...,VK ) sample i.i.d observations where Vi represents acluster consisting of(

    ni , (T̃i1,∆1i ,Xi1), ..., (T̃ini ,∆ini ,Xini ))

    I ni : cluster sample sizeI T̃ij = min(Tij ,Cij): the observed failure timeI ∆ij = I (Tij ≤ Cij) : the censoring indicatorI Xij set of covariates with i = 1..K and j = 1, ..., ni

    we assume clustered data: in each cluster k (Ti1,Ti2, ...,Tini ) can becorrelated conditionally on (Xi1,Xi2, ...,Xini )

    Alessandra MEDDIS (Institute Curie) October 11th 7 / 22

  • Two different marginal analyses

    When cluster data arises two marginal analyses are of interest:I for the population of all observed members (AOM)

    F we refer to a typical individual randomly sampled by the entirepopulation

    F equal weight to each individual and larger clusters contributemore to inference

    I for the typical member of a typical cluster (TOM)F we refer to a randomly selected individual from a randomly

    selected clusterF same weight to individuals within same cluster and each cluster

    contribute equally to inference.

    Alessandra MEDDIS (Institute Curie) October 11th 8 / 22

  • Two marginal analyses: illustration

    Alessandra MEDDIS (Institute Curie) October 11th 9 / 22

  • (Non) Informative cluster size

    Let rk be the index of a randomly selected member of cluster k .Hoffman et al. [2001] define non informative cluster size (NICS)

    P(Drk (t) = 1|Xrk = x ,Nk) = P(Drk (t) = 1|Xrk = x)

    otherwise the cluster size is said to be informative (ICS)

    I Given large enough sample sizes, the two marginal analysescoincides under NICS 3

    I under ICS they differ in general → it is important to precisewhich quantities we are interested to.

    3S. Seaman, M. Pavlou, and A. Copas. Review of methods for handlingconfounding by cluster and informative cluster size in clustered data. Statistics inmedicine, 33(30):5371–5387, 2014

    Alessandra MEDDIS (Institute Curie) October 11th 10 / 22

  • Consequences of ICS

    When informative cluster size is detected, more care is needed in theinterpretation of results:

    the estimated quantities depend on the distribution of Nk (studydesign to collect the data) which is specific to the population inanalysis.

    it is challenging to generalize the results to other populations

    → appropriate methods to take into account the informationcarried by the cluster sample size are necessary.Several approaches have been proposed, motivated by data thatrely on the assumption of ICS, but no formal test was performed.

    ♣ We propose a test for informative cluster size with survivaldata.

    Alessandra MEDDIS (Institute Curie) October 11th 11 / 22

  • Illustration: Non informative cluster size

    Tik

    Uk

    Xik

    Nk

    Uk is the random effect for the unmeasured covariates which arecommon to all members of the same cluster k (correlated failuretimes)

    Nk does not affect Tik → non informative cluster size

    Alessandra MEDDIS (Institute Curie) October 11th 12 / 22

  • Illustration: Informative cluster size

    Tik

    Uk

    Xik

    Nk

    Uk is the random effect for the unmeasured covariates which arecommon to all members of the same cluster k (correlated failuretimes)

    Nk affects Tik → informative cluster size

    Alessandra MEDDIS (Institute Curie) October 11th 13 / 22

  • Notations

    Let i = 1, 2, ...,K index the cluster and j = 1, 2, ..., ni theindividuals within cluster i with N =

    ∑i ni . We define:

    I Nij(t) = I (T̃ij ≤ t,∆ij = 1) : the counting processI αij(t)Yij(t) : the intensityI Yij(t) = I (T̃ij ≥ t) : the at-risk process

    Mij(t) = Nij(t)− Λij(t) is a martingale with respect to thefiltration Fij(t) = σ{Nij(u),Yij(u) : 0 ≤ u ≤ t}.

    Alessandra MEDDIS (Institute Curie) October 11th 14 / 22

  • Nelson-Aalen estimator

    We define the Nelson-Aalen estimator of the cumulative risk for thetwo marginal analyses:

    Λ̂tom(t) =

    ∫ t0

    dNtom(s)

    Ytom(s)ds with Ntom(t) =

    1

    K

    ∑i

    1

    ni

    ∑j

    Nij(t)

    Λ̂aom(t) =

    ∫ t0

    dNaom(s)

    Yaom(s)ds with Naom(t) =

    1

    N

    ∑i

    ∑j

    Nij(t)

    Alessandra MEDDIS (Institute Curie) October 11th 15 / 22

  • Test statistic

    Test for Informative Cluster Size:I H0 : equality of the intensity of the process Nij(t) obtained by

    the two analysis (tom/aom) at each time t:

    H0 :1K

    ∑i

    1ni

    ∑jαij (t)Yij (t)

    Ytom= 1N

    ∑i

    ∑jαij (t)Yij (t)

    Yaom= αk (t)Yk (t) ∀t

    I test statistic:

    Z (τ) =

    ∫ τ0

    L(t)(d Λ̂tom − d Λ̂aom)

    L(·) is a weight function

    Alessandra MEDDIS (Institute Curie) October 11th 16 / 22

  • Under NICS

    Under the null hypothesis :

    we define L(t) = Yaom(t)Ytom(t)K

    with some algebra we can rewrite

    Z (τ)1√K

    =1√K

    K∑i=1

    ∫ τ0

    Wi(t)dMi(t)

    Wi(t) =Yaom(t)

    niK− Ytom(t)

    K

    I 1√K

    ∑i

    ∑j

    ∫ τ0 dMij converges to a Gaussian process

    4

    4Z.Ying and L.J.Wei. The Kaplan-Meier estimate for dependent failure timeobservations.Journal of Multivariate Analysis vol.50 pp 17-29,1994

    Alessandra MEDDIS (Institute Curie) October 11th 17 / 22

  • Asymptotic distribution

    Assume that exists yaom(t), ytom(t) such that for N →∞

    Yaom/niK → yaom(t)Ytom/K → ytom(t)

    ⇒ Z (τ) 1√K

    is asymptotically equivalent to a Gaussian with mean 0

    and covariance: V = 1N

    ∑i

    ∑j

    ∑j ′ �ij�ij ′

    with �ij =∫ τ

    0ωi(t)dMij(t) estimated by

    �̂ij = ∆ijωi(Tij)−∑

    k

    ∑l

    ∆klωi (Tkl )Yij (Tkl )∑m

    ∑f Ymf (Tkl )

    Alessandra MEDDIS (Institute Curie) October 11th 18 / 22

  • Simulation designWe conduct a simulation to check for the asymptotic distribution ofthe test statistic

    Correlated survival data withNICS:

    I shared frailty modelI frailty Uk ∼ Gamma(1.4)→ var(Uk) = 0.7

    I no covariates

    K=40 clusters with sample sizesNk ∈ [20, 70]M=1000 replications

    Statistic distribution under NICS

    Z

    Den

    sity

    −3 −2 −1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    Alessandra MEDDIS (Institute Curie) October 11th 19 / 22

  • On going work

    Simulation studyI assess the power of the test at different number of clusters and

    cluster sample sizesI introduce covariates

    Apply the test of ICS in the example on hepatocellularcarcinoma.

    Alessandra MEDDIS (Institute Curie) October 11th 20 / 22

  • References I

    Hoffman, E. B., Sen, P. K., and Weinberg, C. R. (2001).

    Within-cluster resampling.

    Biometrika, 88(4):1121–1134.

    Seaman, S. R., Pavlou, M., and Copas, A. J. (2014).

    Methods for observed-cluster inference when cluster size isinformative: A review and clarifications.

    Biometrics, 70(2):449–456.

    Williamson, J. M., Kim, H.-Y., Manatunga, A., and Addiss, D. G.(2008).

    Modeling survival data with informative cluster size.

    Statistics in medicine, 27(4):543–555.

    Alessandra MEDDIS (Institute Curie) October 11th 21 / 22

  • Thank you for your attention

    Alessandra MEDDIS (Institute Curie) October 11th 22 / 22

  • Two marginal analyses: Illustration 2

    Alessandra MEDDIS (Institute Curie) October 11th 23 / 22