articolo Dreassi ridotto

Embed Size (px)

Citation preview

  • 7/30/2019 articolo Dreassi ridotto

    1/15

    DOI: 10.1007/s10260-003-0078-7Statistical Methods & Applications (2004) 13: 87101

    c Springer-Verlag 2004

    A multilevel Bayesian model for contextual effect

    of material deprivation

    Annibale Biggeri, Emanuela Dreassi, Marco Marchi

    Dipartimento di Statistica G. Parenti, Universita di Firenze, Viale Morgagni 59, 50134 Firenze, Italy

    (e-mail: {abiggeri,dreassi,marchi}@ds.unifi.it)

    Received: January 10, 2002 / Revised version: June 23, 2003

    Abstract. The relationship between socioeconomic factors and health has been

    studied in many circumstances. Whether the association takes place at individual

    level only, or also at population level (contextual effect) is still unclear. We present

    a multilevel hierarchical Bayesian model to investigate the joint contribution of in-

    dividual and population-based socioeconomic factors to mortality, using data fromthe census cohort of the general population of the city of Florence, Italy (Tuscany

    Longitudinal Study, 19911995). Evidence supporting a contextual effect of de-

    privation on mortality at the very fine level of aggregation is found. Inappropriate

    modelling of individual and aggregate variables could strongly bias effect estimates.

    Key words: Hierarchical Bayesian model, multilevel model, material deprivation

    index, contextual effects, ecological fallacy

    1. Introduction: Material deprivation and contextual effect

    Material deprivation indicators usually refer to the occurrence of subject states such

    as unemployment, low education, living in a very small dwelling, overcrowding,

    not having a car (e.g. see Townsend et al., 1988, Jarman, 1983 and Morris and

    Carstairs, 1991). So far, they have been used as aggregate-level covariates to adjust

    ecological regression coefficients in small area studies (St Leger, 1995). In fact, a

    strong association of area based deprivation and mortality, on one side, and area

    based deprivation and exposure to environmental/individual hazards, on the otherside (Eachus et al. 1996, Davey Smith et al. 1998, Pell et al. 2000) was repeatedly

    found. Many authorsstressed the correlation of material deprivation with prevalence

    of known risk factors, like cigarette smoking, in agreement with the hypothesis that

    The research on Tuscany Longitudinal Study (Studio Longitudinale Toscano, SLTo) was supported

    by the Regione Toscana Servizio Statistica.

  • 7/30/2019 articolo Dreassi ridotto

    2/15

    88 A. Biggeri et al.

    material deprivation be responsible of only indirect effects or simply acts as a

    surrogate variable. For example, Sundquist et al. (1999) conducted an individual

    level study showing that the prevalence of material deprivation is explained by

    individual-level risk factors for cardiovascular disease (obesity, hypertension).

    In epidemiology the interpretation of the effect of any aggregate-level variable

    is however controversial. Diez-Roux (1998) discussed atomistic versus eco-

    logical fallacy, and proposed to include aggregate variables at different level of

    aggregation in individual studies. Only a few papers considered individual and ag-

    gregate data hierarchically structured: Anderson et al. (1997), for example, used

    the same variable (personal income) at the individual and aggregate level (the cen-

    sus tract median). The idea was to study whether the aggregate variable could still

    be predictive of the response after having considered the individual level variable

    (Firebaugh, 1978).

    In the present paper, the analysis of contextual variables such as material de-privation indicators is reframed by using Cronbachs model (Cronbach and Webb,

    1975) and Bayesian multilevel approach. Our aims are: 1) to show that different

    ways of modelling contextual variables assume different prior believes on the ex-

    istence and nature of the effects, and 2) to show that a simple individual-level

    analysis could produce more biased results than a simple aggregate-level analysis.

    This is done using a material deprivation index and data derived from the Tuscany

    Longitudinal Cohort Study (see Biggeri et al. 2001).

    In Sect. 2, as motivating example, we present the data and a descriptive analysis

    of mortality by city wards and deprivation levels. Section 3 introduces the differentstatistical models used: individual, aggregate and multilevel models. More details

    are given to describe Bayesian modelling and the difference between contextual and

    Cronbach approaches. The results and the main conclusion are showed respectively

    in Sects. 4 and 5.

    2. Materials: the Florence Census Cohort Study

    The data come from a census-based cohort study. All residents in Florence (Tuscany,Italy) at the census day 1991, October 31-st, have been enrolled and their mortality

    followed-up by automated procedures of record-linkage up to 1995, December 31-

    st. The cause of death certificates have been collected by the Tuscany Mortality

    Register. Observed and expected deaths (all causes, males, age groups greater than

    14 years) using internal standardization have been calculated by census-tracts and

    sub-urban areas (city wards).A total of 163613 people have been enrolled, 639662.5

    person years at risk have been observed in the follow-up period and 8612 deaths

    have occurred. The crude death rate was 13.46 per thousand highlighting the high

    percentage of old people in the considered population. The city is composed by 14city wards (Fig. 1) and 2752 census tracts (Fig. 2). Table 1 shows, for each city

    ward, observed deaths for all causes, the corresponding Standardized Mortality

    Ratios (SMR), the Bayesian relative risks (RR) and the 95% credibility intervals

    evaluated on the simulated posterior distributions (respectively as the mean and

    the 2.5% and 97.5% of the sampled values) estimated using the spatial Bayesian

  • 7/30/2019 articolo Dreassi ridotto

    3/15

    A multilevel model for Contextual Effect of Material Deprivation 89

    Fig. 1. The ward of the City of Florence (Tuscany, Italy)

    model of Besag et al. (1991). There is a strong gradient in mortality among city

    wards. Two wards (Mantignano and Ponte di Mezzo) appeared at higher risk (about

    14% excess) and one (Poggetto) significantly lower (about 10% deficit). Figure 3shows the map of relative risks in the city: the western area appears more affected.

    Material deprivation has been defined for each individual as the frequency of the

    following unfavourable events: unemployment, low education (less than 6 years

    of schooling), poor housing condition (less than 25 square metres), and absence

    of bathroom in the flat. In Table 2 crude death rates by material deprivation are

    reported. Material deprivation strongly affects mortality, with a clear trend from a

    standardized rate of 13.12 per thousand among not deprived people up to 20.66 per

    thousand among the most deprived (2 or more unfavorable events). The prevalence

    of deprived people by city wards is reported in Table 3. There is some evidence

    that higher mortality correlates with higher prevalence of material deprivation on

    pure ecological comparison: the highest and lowest city ward for mortality are the

    highest and lowest for deprivation prevalence of deprived people. The presence of

    a contextual effect could be speculated restricting the analysis to the stratum of

    not deprived people (Table 4): higher SMRs and Bayesian relative risks are still

    observed in the city wards with higher prevalence of material deprivation.

    The emphasis here is not in interpreting such hypothesized effects, but to show

    how these kind of data should properly be analyzed. Next section introduces the

    used statistical models.

    3. Methods

    We use individual data or cross-tabulated data (where the statistical unit is the cell,

    after count data have been generated collapsing by deprivation level categorized as

  • 7/30/2019 articolo Dreassi ridotto

    4/15

    90 A. Biggeri et al.

    Table 1. Observed deaths, Standardized Mortality Ratios (SMR), Bayesian relative risks (RR) and 95%

    credibility interval (CI95%) for all causes mortality by city ward, Florence 19911995, males. Tuscany

    Longitudinal Study

    city wards obs SMR RR CI95%

    Duomo 679 1.02 1.02 0.94-1.09

    Gavinana 719 0.94 0.95 0.88-1.01

    Santo Spirito 637 1.02 1.01 0.94-1.09

    Legnaia 877 1.00 1.01 0.95-1.08

    Mantignano 548 1.14 1.13 1.04-1.22

    Novoli 747 1.01 1.01 0.95-1.09

    Ponte di mezzo 440 1.15 1.13 1.03-1.24

    San Jacopino 529 1.09 1.08 1.00-1.17

    Le Panche 355 0.91 0.92 0.83-1.01

    Poggetto 755 0.90 0.91 0.85-0.97

    San Gallo 531 1.00 1.00 0.92-1.09Oberdan 795 1.02 1.01 0.95-1.08

    Campo di Marte 483 0.96 0.96 0.88-1.05

    Coverciano 530 0.95 0.95 0.87-1.03

    Table 2. Observed deaths and crude rates (per thousand and 95% confidence interval, CI95%) for all

    causes mortality by deprivation index, Florence 19911995, males. Tuscany Longitudinal Study

    deprivation index obs rate(1000) CI95%

    0 7166 13.12 12.8013.40

    1 1308 15.05 14.2015.90

    2+ 138 20.66 17.5024.40

    0, 1, 2 or more unfavourable events, census tract and age-group). Let i denote thegeneric individual (or cell), j the census tract and k the city ward.

    Let xijk denote the material deprivation index for the generic i-th subject (orcell) living in j-th census-tract within k-th city ward; xjk the census tract averageand xk the city ward average of deprivation index. In order to compare the effect

    size of each variable in the subsequent regression analyses, all the variables havebeen standardized dividing them by their respective sample standard deviations.

    We will compare the results from the following regression models:

    Cox proportional hazard regression (using individual data and age as time axis);

    Poisson regression models (using cross-tabulated data).

    The models were fitted to data at different levels of aggregation:

    individual level;

    aggregate level;

    multilevel, following both contextual and Cronbachs definition (see Cronbach

    and Webb, 1975, Boyd and Iversen, 1979 and Kreft et al., 1995).

    When individual data are considered, the response variable Yijk is an indicatorfor status (death or alive) joined to time to event variable (i = 1, . . . , 163613). Weused a Cox proportional hazard regression having specified attained age as the time

  • 7/30/2019 articolo Dreassi ridotto

    5/15

    A multilevel model for Contextual Effect of Material Deprivation 91

    Table 3. Distribution of the study cohort by city wards and deprivation index (number of unfavorable

    events see text for each cohort member), Florence 19911995, males. Tuscany Longitudinal Study

    city wards 0 1 2+ total

    % % %

    Duomo 11227 0.82 2185 0.16 213 0.02 13625

    Gavinana 11464 0.85 1928 0.14 147 0.01 13539

    Santo Spirito 9752 0.82 1990 0.17 182 0.02 11924

    Legnaia 14234 0.85 2345 0.14 117 0.01 16696

    Mantignano 10348 0.82 2142 0.17 186 0.02 12676

    Novoli 14581 0.85 2419 0.14 228 0.01 17228

    Ponte di mezzo 5687 0.80 1282 0.18 138 0.02 7107

    San Jacopino 7727 0.88 985 0.11 52 0.01 8764

    Le Panche 6229 0.84 1089 0.15 84 0.01 7402

    Poggetto 12186 0.89 1400 0.10 85 0.01 13671

    San Gallo 7601 0.87 1113 0.13 71 0.01 8785Oberdan 11797 0.89 1386 0.10 98 0.01 13281

    Campo di Marte 7530 0.90 810 0.10 39 0.01 8379

    Coverciano 9065 0.86 1377 0.13 94 0.01 10536

    Table 4. Observed deaths, Standardized Mortality Ratios (SMR), Bayesian relative risks (RR) and 95%

    credibility interval (CI95%) for not-deprived people. Mortality for all causes, Florence 19911995,

    males. Tuscany Longitudinal Study

    city wards obs SMR RR CI95%

    Duomo 575 1.05 1.03 0.96 1.10Gavinana 575 0.92 0.94 0.87 1.00

    Santo Spirito 490 0.98 0.99 0.91 1.06

    Legnaia 720 1.00 1.01 0.94 1.07

    Mantignano 410 1.14 1.10 1.00 1.19

    Novoli 610 1.04 1.04 0.96 1.10

    Ponte di mezzo 341 1.14 1.09 0.99 1.18

    San Jacopino 468 1.10 1.07 0.99 1.15

    Le Panche 289 0.93 0.97 0.88 1.05

    Poggetto 663 0.91 0.94 0.87 1.00

    San Gallo 455 1.00 1.00 0.92 1.07

    Oberdan 689 1.01 1.00 0.93 1.06Campo di Marte 425 0.95 0.96 0.89 1.03

    Coverciano 456 0.98 0.98 0.90 1.06

    axis and allowing left censoring (age at entry). We consider log ijk the risks ratio,specifying in the linear predictor material deprivation index at several levels.

    Individual data have been also collapsed generating counts by five year

    age groups and calculating expected number of deaths by indirect internal age-

    standardization on the person years tabulated by deprivation index and census tract(i = 1, . . . , 42902). Then a Poisson regression model has been used to estimatecovariate effects. In particular, we defined as response Yijk the number of observedevents and we specified Yijk Poisson(Eijkijk), where ijk represent the rel-ative risk for the generic individual i, living in the j-th census tract and k-th cityward and Eijk a population denominator (the expected number of deaths). We then

  • 7/30/2019 articolo Dreassi ridotto

    6/15

    92 A. Biggeri et al.

    specified a linear model for log ijk with material deprivation at different levels aspredictors.

    3.1. Models for individual level

    The linear predictor is formulated in the following way:

    log ijk = 0 + 1xijk ,

    where the covariate xijk is defined for the i-th subject (or cell).

    3.2. Models for aggregate level

    These models are formulated on two different levels of data aggregation using

    the covariate xjk or xk defined at census tract or city ward level. The models arerespectively:

    log jk = 0 + 1xjk ,

    log k = 0 + 1xk .

    These analysis have been performed by means of Poisson regression models for

    cross-tabulated data.

    3.3. Contextual multilevel models

    These models are specified using both individual xijk and aggregate xjk (or xk)covariates in the same regression model.

    log ijk = 0 + 1xijk +

    2xjk + 3xk .

    This kind of models, involving both individual and averaged variables, are called

    contextual models (Boyd and Iversen, 1979). In the epidemiological applicationsthe term contextual has been used more broadly, to address to any aggregate

    variable even in absence of the corresponding individual level variable.

    3.4. Cronbachs multilevel models

    The previous models could be instable due to multicollinearity (covariates usually

    exhibit a strong correlation). A simple centering of the deprivation index variables

    gives rise to the Cronbachs model, a multiple regression model with all the variables

    being centered. The model becomes:

    log ijk = 0 + 1(xijk xjk) +

    2(xjk xk) + 3(xk x)

    This model, proposed in the analysis of educational data in late seventies, has

    a nice interpretation of model parameters and has not yet been widely used in

  • 7/30/2019 articolo Dreassi ridotto

    7/15

    A multilevel model for Contextual Effect of Material Deprivation 93

    the epidemiological literature. Although exact algebraic correspondence is valid

    for gaussian linear models only, covariance decomposition applies to this model

    (Sheppard, 2003). Cronbachs and contextual models in general can be compared

    in non linear case (see Sheppard, 2003); only when a pure ecological model is

    fitted (i.e. a model for only aggregate response and explanatory variables) we looseperfect algebraic comparability. Aggregate regression coefficients {2, 3} are notconfounded by individual level covariate xijk . Cox and Poisson regression modelshave been fitted as in the previous models. Poisson regression has been performed

    also into a hierarchical multilevel Bayesian models approach as follows.

    3.5. Hierarchical multilevel Bayesian Models

    A Bayesian model has been specified to take into account for the hierarchies impliedin the data, where individuals are grouped by census tracts and city wards. By

    means of hierarchical Bayesian modelling we are able to consider multiple sources

    of variability at the same time, possibly including a spatial dependence among

    neighboring census tracts or city wards. Indeed, the previous regression approaches

    fail in estimating the uncertainty in the effect estimate of higher level covariates

    (see Goldstein 1995 and Greenland 2002 for an epidemiological perspective).

    Hierarchical Bayesian models have been specified on the number of observed

    and expected events under internal age-standardization by deprivation index and

    census tract:

    Yijk Poisson(Eijkijk)

    where ijk is the relative risk for the generic individual i, living in the j-th cen-sus tract and k-th city ward, with given degree of material deprivation, as beforementioned.

    A simple regression model for the relative risk consists in separate random

    intercepts for each area unit (census tract/ward). The intercepts can be parameterized

    as realizations of random variables with fixed zero means and unknown variances.

    The model becomes

    log ijk = 0

    jk + 1xijk

    where the random coefficients are assumed to follow a know parameter distribution,

    for example 0jk Normal(0, (0)1).

    Alternatively, both the intercepts and the slopes can be parameterized as a

    realization of random variable(s) with fixed mean and unknown covariance matrix

    log ijk = 0

    jk + 1

    jkxijk

    and (0jk , 1

    jk) Multivariate Normal(,T1).

    Both of them are examples of general ANCOVA (Analysis of Covariance) mod-

    els which could be used to get unbiased effect estimates of the individual effect

    level covariate while adjusting for the aggregate (hierarchical) nature of the data

    (subjects within census tract within city wards). However these model are highly

  • 7/30/2019 articolo Dreassi ridotto

    8/15

    94 A. Biggeri et al.

    parameterized and, more parsimoniously, between area units variability could be

    explained by aggregate level covariates.

    Different models, depending on the assumed structure of random effect terms

    have been specified:

    a) Not spatially structured random intercepts and slopes for each city ward.b) Spatially dependent (using a Gaussian Autoregressive Conditional model,

    see Bernardinelli et al., 1995) random intercepts and slopes for each city ward.

    c) Spatially dependent random intercepts and slopes for each city ward and not

    spatially structured random intercepts for each census tract.

    d) Random intercepts and slopes for each city ward and random intercepts for

    each census tract (both spatially unstructured). This last model is

    log ijk = (0

    k + 4

    j ) + 1

    k(xijk xjk) + 2(xjk xk) +

    3(xk x)

    with prior distributions Normal(0,10000) for fixed coefficients 2 and 3; priordistributions for each k-th random coefficients

    (0k, 1

    k) Multivariate Normal(,T1),

    and Normal prior for each j-th random coefficients

    4j Normal(4, (4)1).

    Hyperpriors for and T are, respectively

    Multivariate Normal

    00

    ,

    0.0001 0

    0 0.0001

    1

    and

    T Wishart

    0.1 0.005

    0.005 0.01

    1

    , 2

    Hyperprior for 4 is Normal (0, 10000), for 4 is Gamma (0.001, 10000). Thispriors and hyperpriors can be regard as non informative since they have a very

    large variance. In the absence of a prior knowledge, the prior distribution can be

    chosen to be vague; then the prior distribution has only a negligible influence on

    the results and the shape of the posterior will be nearly the same (for a review about

    non informative prior distributions on Bayesian inference see Kass and Wasserman,

    1996).Models (a) and (b) ignore the census tract level. The former assumes exchange-

    able random terms while the latter specifies a conditionally autoregressive structure

    among city wards. This assumption is more realistic as could be argued from Fig. 3.

    Models (c) introduce the census tract level spatially unstructured. The spatial de-

    pendence at lower level (census tract) has not be considered because it has been

  • 7/30/2019 articolo Dreassi ridotto

    9/15

    A multilevel model for Contextual Effect of Material Deprivation 95

    Fig. 2. The 1991s census tracts of the City of Florence (Tuscany, Italy)

    Fig. 3. Relative risk for all causes mortality, Florence 19911995, males. Tuscany Longitudinal Study

    enclosed when we define a spatial dependence at higher level (city ward). The shape

    of census tracts and city wards seem to suggest that spatial structure, based on area

    adjacencies, is more appropriate when considering city ward subdivision. Finally,

    in model (d) both census tract and city ward are spatially unstructured.

    Model comparison has been performed using the expected predictive deviance

    (EPD) criterion:

    2

    (Yijk + 0.05) log((Yijk + 0.05)/(Y

    ijk + 0.05)) Yijk + Y

    ijk ,

    where predicted data Yijk are sampled from a Poisson(Eijkijk) and ijk are the

    estimates obtained from the posterior distributions. The EPD measures the dis-

    crepancy between the observed and predicted data, which can be expressed (see

  • 7/30/2019 articolo Dreassi ridotto

    10/15

    96 A. Biggeri et al.

    Table 5. Log-Relative Risks and standard error for standardized scores (relative effects) of depriva-

    tion index obtained by different models (see text). All causes, Florence 19911995, males. Tuscany

    Longitudinal Study

    model covariates individual data cross-tabulated data

    individual individual xijk 0.058 (0.010) 0.076 (0.010)

    aggregate census-tract xjk 0.066 (0.010)

    aggregate ward xk 0.028 (0.011)

    contextual individual xijk 0.040 (0.011) 0.061 (0.010)

    avg-census xjk 0.057 (0.011) 0.045 (0.011)

    avg-ward xk 0.013 (0.011) 0.009 (0.011)

    Cronbach individual (xijk xjk) 0.037 (0.010) 0.058 (0.010)census tract (xjk xk) 0.067 (0.010) 0.061 (0.010)

    ward (xk x) 0.032 (0.011) 0.026 (0.011)

    Table 6. Hierarchical Bayesian models estimates for fixed coefficients; expected posterior (EPoD),

    predictive deviance (EPD) and model complexity. All causes, Florence 19911995, males. Tuscany

    Longitudinal Study. In bold the lower EPoD and EPD measure

    model (xjk xk) (xk x) EPoD EPD complexity

    (a) 0.0636 0.0405 13983.10 28702.77 14719.67

    (0.0101) (0.0279) (53.20) (285.16)

    (b) 0.0636 0.0385 13979.19 28696.32 14717.13(0.0101) (0.0327) (51.58) (283.23)

    (c) 0.0621 0.0308 13418.14 28237.55 14819.41

    (0.0114) (0.0348) (69.69) (289.22)

    (d) 0.0625 0.0292 13401.50 28222.20 14820.70

    (0.0113) (0.0300) (68.78) (284.09)

    (a) Not spatially structured random intercepts and slopes for each city ward.(b) Spatially dependent random intercepts and slopes for each city ward. (c)

    Spatially dependent random intercepts and slopes for each city ward and not

    spatially structured random intercepts for each census tract. (d) Random inter-

    cepts and slopes for each city ward and random intercepts for each census tract

    (both spatially unstructured).

    Gelfand and Ghosh, 1998) as the sum of a goodness-of-fit term (the Expected

    Posterior Deviance, EPoD) and a penalty term for model complexity.

    4. Results

    The logarithm of the relative risks and their standard errors obtained from the Coxmodel and the Poisson regression for each level of data aggregation are reported

    on Table 5.

    The individual level analysis provides only effect estimates of individual level

    covariates. If contextual effects are supposed to act, those estimates would be biased.

    In case of the linear Gaussian model it can be proved that the bias depends on the

  • 7/30/2019 articolo Dreassi ridotto

    11/15

    A multilevel model for Contextual Effect of Material Deprivation 97

    Table 7. Individual effects (constant and coefficient) and descriptive measure of the mean deprivation

    for each city ward (xk) on model (d). All causes, Florence 19911995, males. Tuscany Longitudinal

    Study. For the less deprived wards (lower xk) the individual effects is greater

    city ward constant coefficient (xijk xjk) xk

    Duomo 0.0906 (0.14023) 0.0179 (0.02723) 0.191633Gavinana 0.1439 (0.14053) 0.0737 (0.02701) 0.164119Santo Spirito 0.1059 (0.14566) 0.0805 (0.02720) 0.197417Legnaia 0.1115 (0.13873) 0.0694 (0.02605) 0.154468Mantignano 0.0440 (0.14147) 0.0360 (0.02828) 0.198328Novoli 0.1018 (0.14001) 0.0408 (0.02755) 0.166880Ponte di mezzo 0.0661 (0.14883) 0.0387 (0.03241) 0.219221San Jacopino 0.0065 (0.14391) 0.0601 (0.03180) 0.124258Le Panche 0.1771 (0.14585) 0.0501 (0.03540) 0.169819Poggetto 0.1388 (0.14362) 0.0774 (0.03157) 0.114842

    San Gallo 0.0709 (0.14160) 0.0462 (0.03395) 0.142857Oberdan 0.0934 (0.14643) 0.0230 (0.03426) 0.119127Campo di Marte 0.0910 (0.15350) 0.0877 (0.03987) 0.105980Coverciano 0.1546 (0.14440) 0.0146 (0.03474) 0.148538

    Table 8. Precision matrix T for the individual effects for each city ward of the Bayesian model (d), the

    mean and standard deviations of posterior distribution

    precision element posterior mean and standard deviation

    T0k0k

    97.4502 (39.53246)

    T1

    k1

    k 490.0663 (238.7085)T0

    k1k

    12.92921 (63.99769)

    size of the contextual level effect times the ratio between the variance ofxj and thevariance ofxij .

    On the contrary the aggregate level analysis provides unbiased effect estimates

    of the overall effect of X (in the linear case the sum of the true individual andcontextual effects). It should be noticed that the bias of the individual level effect

    estimates reflects the importance of properly accounting for the hierarchies in thedata. This bias is opposite to the ecological fallacy, which arises when the effect

    estimates obtained by aggregate level analysis are used as an approximation to the

    true individual effect estimates. From table 5, column relative to cross-tabulated

    data, the overall effect is estimated 0.066 by aggregate model at census-tract level,

    while the individual level model gives an overestimated coefficient of 0.076.

    In principle, Contextual models provide unbiased estimates of the true individ-

    ual effect. Note that only if the analysis is conducted at census-tract level we will

    obtain unbiased estimates of individual effect. The general rule is that estimates of

    the effect of individual covariates are biased unless appropriate aggregate level ofanalysis be specified. The true individual effect is estimated 0.040 by the Cox model

    and 0.061 by Poisson regression. These compares to 0.058, 0.076 respectively when

    individual models were fitted.

    Cronbachs models provide unbiased estimates of individual effects as well.

    The effect estimates of the aggregate variables are comparable to those obtained

  • 7/30/2019 articolo Dreassi ridotto

    12/15

    98 A. Biggeri et al.

    13000 13500 14000 14500

    0.

    0

    0.0

    01

    0.

    002

    0.

    003

    0.

    004

    0.

    005

    model (a)model (b)model (c)model (d)

    Fig. 4. Posterior deviance distributions for the hierarchical Bayesian models

    fitting the model to aggregate data (0.061 and 0.026 compared to 0.066 and 0.028

    for census-tract and city ward respectively).The Bayesian multilevel approach must be taken into account to assure validity

    to effect estimates and their precisions. Effect estimates for the multilevel hierar-

    chical Bayesian models (ad) are reported on Table 6. We note that estimates of

    contextual effects are very close to those obtained by Cox model and Poisson re-

    gression (Table 5), but with larger standard errors, as expected (using model (d) we

    obtain 0.0625 standard error 0.0113 for census tract level; 0.0292 standard error

    0.030 for city ward level). Hierarchical Bayesian models properly address multi-

    ple sources of variability, with special regard to uncertainty of effect estimate of

    higher level covariates (see paragraph 3.5). The underestimation of standard errorsis proportional for each level of the hierarchy to the number of clusters and the

    between/within variance component ratios. The reader is invited to note the big

    change in the size of standard errors for the covariate defined at ward level (only

    14 ward) and the minor change for the covariate defined at census tract level (2752

    tracts).

    The selection among the fitted Bayesian models has been done using EPD

    values. The mean and standard deviation of posterior and predictive deviance dis-

    tributions for the considered models are also reported on Table 6, the graph of

    posterior deviance distributions for models (a)(d) are shown in Fig. 4, the predic-tive deviance distributions in Fig. 5. Introducing the census-tract level decreased

    substantially the posterior deviance, much more than the increase in model com-

    plexity; model (d) resulted best.

    The selected model was therefore the model with only spatially unstructured

    effects for city wards and census tracts. For this model the estimated individual

  • 7/30/2019 articolo Dreassi ridotto

    13/15

    A multilevel model for Contextual Effect of Material Deprivation 99

    27000 27500 28000 28500 29000 29500 30000

    0.

    0

    0.

    0005

    0.

    0010

    0.

    0015

    model (a)model (b)model (c)model (d)

    Fig. 5. Predictive deviance distributions for the hierarchical Bayesian models

    -4 -2 0 2 4

    -0.

    2

    -0.

    1

    0.

    0

    0.

    1

    0.

    2

    DuomoGavinana

    Santo SpiritoLegnaiaMantignanoNovoliPonte di MezzoSan JacopinoLe PanchePoggettoSan GalloOberdanCampo di MarteCoverciano

    Fig. 6. Individual effects for each city ward on hierarchical Bayesian model (d)

    effects by city ward are shown in Fig. 6. The mean and standard deviations of the

    posterior distributions of individual effects are shown on Table 7 together with the

    city ward average deprivation index. The mean and standard deviations of posterior

    distributions of the hyperparameters contained in the precision matrix T are shown

    on Table 8.

  • 7/30/2019 articolo Dreassi ridotto

    14/15

    100 A. Biggeri et al.

    5. Conclusions

    A class of models for contextual analyses (when covariates are measured at indi-

    vidual and aggregate level) are reviewed. We suggest those models which include

    centering of covariates (Cronbachs model) and a multilevel Bayesian approach tocope with random effects and consistently estimates precision parameters. Various

    alternative modelling have been discussed, with special emphases to random effects

    spatial models. Bayesian model comparison is performed using measures which

    take into account for model complexity. The example highlights the difficulties and

    biases of simple analyses conducted at only one level of the hierarchy.

    Any analysis of individual level variables, which does not consider the multilevel

    data structure, will give biased results, unless the contextual effect is null. In turn,

    the analysis conducted at the aggregate level only gives biased standard errors

    and could provide biased point estimates if contextual effect is null and ecologicalconfounding or effect modification is present. However it will give an estimate of

    the overall effect (contextual plus individual), provided no confounding is acting.

    In fact, when considering the data hierarchy, in the presence of an individual

    level effect only, the aggregate level effect in the Cronbachs model would be equal

    to the individual level effect, provided no ecological bias is in action.

    In presence of an aggregate level effect only, the individual level effect in the

    Cronbachs or contextual model would be close to the null value.

    In other cases the aggregate level effect in the Cronbachs model could be

    interpreted as the overall covariate effect, the sum of individual and contextualeffects. The explanation of the causal mechanism involved in contextual effects is

    matter of subject-specific research.There are still several subtletiesto be considered,

    especially in the non-linear case, which are beyond the scope of our paper. The

    interested reader is referred to Sheppard (2003).

    In the Florence census cohort 19911995, material deprivation appeared to be

    strongly associated with mortality for all causes. Our findings suggest the presence

    of complex patterns of associations between deprivation and mortality, involving

    individual and small area effects. The analysis restricted to sub-groups of population

    (most or least deprived) suggested a certain contribution of contextual effects at

    census-tract level. Using aggregate data at census level will give unbiased estimate

    of the overall effect, provided no confounding be active.

    References

    Anderson RT, Sorlie P, Backlund E, Johnson N, Kaplan GA (1997) Mortality Effects of Community

    Socioeconomic Status. Epidemiology 8, 4247

    Bernardinelli L, Clayton D, Pascutto C, Montomoli C, Ghislandi M, Songini M (1995) Bayesian analysis

    of space-variation in disease risk. Statistics in Medicine 14, 24332443

    Besag J, York J, Mollie A (1991) Bayesian image restoration, with two applications in spatial statistics(with discussion). Annals of the Institute of Statistical Mathematics 43, 159

    Biggeri A, Gorini G, Dreassi E, Kalala N, Lisi C (2001) Condizione socio-economica e mortalita in

    Toscana, Studi e Ricerche, n. 7, Edizioni Regione Toscana, Centro Stampa Giunta Regionale,

    Firenze

    Boyd LH, Iversen GR (1979) Contextual Analysis: Concepts and Statistical Techniques. Belmont, CA:

    Wadsworth

  • 7/30/2019 articolo Dreassi ridotto

    15/15

    A multilevel model for Contextual Effect of Material Deprivation 101

    Cronbach LJ,Webb N (1975) Between class andWithin class Effects in a ReportedAptitudeTreatmentInteraction: A reanalysis of a study by G.L. Anderson. Journal of Educational Psychology 67, 717

    724

    Davey Smith G, Hart C, Watt G, Hole D, Hawthorne V (1998) Individual social class, area-based

    deprivation, cardiovascular disease risk factors, and mortality: the Renfrew and Paisley study.

    Journal of Epidemiology & Community Health 52, 399405

    Diez-Roux AV (1988) Bringing contex back into epidemiology: variables and fallacies in multilevel

    analysis. American Journal of Public Health 88, 216222

    Eachus J, Williams M, Chan P, Davey Smith G, Grainge M, Donovan J, Frankel S (1996) Deprivation and

    cause specific morbidity: evidence from the Somerset and Avon survey of health. British Medical

    Journal 312, 287292

    Firebaugh G (1978) A rule for inferring individual-level relationships from aggregate data. American

    Sociological Review 43, 557572

    GelfandAE, Ghosh SK (1998) Model choice: a minimum posterior predictive loss approach. Biometrika

    85, 111

    Goldstein H (1995) Multilevel Statistical Models. Second Edition, London: Edward Arnold

    Greenland S (2002) A review of multilevel model theory for ecologic analyses. Statistics in Medicine

    21, 389395

    Jarman B (1983) Identification of underprivileged areas. British Medical Journal 17051709

    Kass RE, Wasserman L (1996) The selection of Prior Distributions by Formal Rules, Journal of the

    American Statistical Association 91, 13431370

    Kreft IGG, de Leeuw J,Aiken L (1995) The Effect of Different Form of Centering in Hierarchical Linear

    Models. Multivariate Behavioral Research 30, 122

    Morris R, Carstairs V (1991) Which deprivation? A comparison of selected deprivation indexes. Journal

    of Public Health Medicine 13, 318326

    Pell JP, Pell ACH, Norrie J, Ford I, Cobbe SM (2000) Effect of socioeconomic deprivation on waiting

    time for cardiac surgery: retrospective cohort study. British Medical Journal 320, 1518

    Sheppard L (2003) Insight on bias and information in group-level studies. Biostatistics 4, 265278

    St Leger S (Ed.) (1995) Use of deprivation indices in small area studies of environment and health.

    Journal of Epidemiology & Community Health S2, 49, 188

    Sundquist J, Malmstrom M, Johansson SE (1999) Cardiovascular Risk Factors and the Neighbourhood

    Environment. International Journal of Epidemiology 28, 841845

    Townsend P, Phillimore P, Beattie A (1988) Health and deprivation: inequalities and the north. London:

    Croom Helm