Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Performance Assessment in the Context of Multiple
Objectives: A Multivariate Multilevel Analysis
Katharina Haucka, Andrew Streetb*
aCentre for Health Economics bCentre for Health Economics Monash University University of York Clayton, Victoria 3800 YORK YO10 5DD Australia England
Katharina Hauck: [email protected] Andrew Street: [email protected]
* Corresponding author
Acknowledgements
The authors are grateful to the English Department of Health for financial support. Earlier versions of this paper were presented at the 1st Franco-British Meeting in Health Economics, Paris January 2004 and the 5th iHEA World Congress, Barcelona July 2005. We are grateful for comments from Tony Culyer, Yves-Antoine Flor, Giovanni Forchini, George Leckie, Simon Peck, Nigel Rice, Peter C Smith, participants in the York Seminars in Health Econometrics, and the journal’s referees. The views expressed are those of the authors only.
2
Abstract
The pursuit of multiple objectives by public sector organisations makes it difficult to
assess and compare their performance. Considering objectives in isolation ignores the
possibility of correlations between objectives, and a single index of performance
requires subjective judgements to be made about the relative value of each objective.
An alternative approach is to estimate a multivariate system of equations in which
objectives are analysed individually but correlations across objectives are considered
explicitly. We analyse the performance of English Health Authorities against thirteen
objectives using hierarchical data for electoral wards that are nested within Health
Authorities. We find evidence of correlation across objectives, suggesting that some
are complementary and others subject to trade-off. The estimates generated when
assessing performance with multivariate multilevel models as compared to ordinary
least squares or multilevel models differ, with the magnitudes varying by objective
and Health Authority.
JEL Classification: C33; D21; I18; L31
Keywords: Multilevel multivariate models; Performance analysis; Public sector
organisations.
3
Introduction
The pursuit of multiple objectives by public sector organisations makes it
difficult to assess and compare their performance. Commonly performance
assessment of such organisations is a relatively unsophisticated process, consisting of
measurement against a set of objectives that may or may not be aggregated into a
single index of organisational performance. This type of approach neglects the
possibility that organisational achievement may be correlated across objectives. This
correlation may be positive, if progress against one indicator simultaneously advances
another, perhaps because good management promotes all-round performance. But the
correlation may be negative if trade-offs are involved, such as when resources have to
be diverted from one activity in order to meet some other objective.
In this paper we consider methods designed to take account of inter-
relationships among objectives when assessing organisational performance. We
compare techniques to assess the performance of health authorities in the English
National Health Service against a variety of objectives (performance indicators). Our
data are reported at electoral ward level, with wards being clustered within health
authorities, hence the data have a hierarchical (or multilevel) structure.
We build up our methodology in three stages. First, we ignore the hierarchical
structure of the data and the possibility that correlations may exist across objectives.
Treating the health authority as the unit of analysis, we estimate a multiple regression
model separately for each objective, allowing for the possibility that variations in
4
performance might be attributable to differences in population demography and socio-
economic conditions. Levels of observed performance above those predicted by the
regression model are interpreted as indicative of above average performance by the
health authority. Second, recognising the ‘wards within health authorities’
hierarchical structure to the data, we separately identify a health authority effect from
a ward-level effect on performance for each objective. Wards are merely geographical
constructs, groupings of the population for electoral purposes. They are not decision-
making organisations. Thus, the ward effect is interpreted as random variation and the
health authority effect as indicative of health authority performance. Finally,
correlations across objectives are considered by estimating a multivariate, multilevel
model that treats each objective as part of a system of equations.
The paper is structured as follows. In the next section we consider the
drawbacks of standard approaches to performance assessment in the presence of
multiple objectives. We then introduce the multivariate approach to performance
assessment. We describe the data in the following section and then detail the models
to be estimated. Estimation results are then presented, together with a comparison of
the results provided by the different estimation procedures in terms of the impact
upon model parameters and estimates of the relative performance of health
authorities. Concluding comments are offered in the final section.
Conventional approaches
5
In contexts where organisations pursue multiple objectives it is not a simple matter to
assess and make comparative statements about overall performance. Standard
approaches can be classified into three groups.
First, objectives can be considered in isolation, whether individually or as a suite of
performance indicators. As well as there being a danger that only a partial view of
performance is provided, this approach treats objectives as independent by ignoring
any possibility that they might be related. This leads to a loss of information and
potentially biased estimates of performance.
The second type of approach involves imposing a single over-riding objective on
organisations. An example is the balanced scorecard which, when applied to nonprofit
and government agencies, requires “an over-arching objective at the top of their
scorecard that represents their long-term objective” (Kaplan and Norton, 2001). The
multiple sub-ordinate objectives are then “oriented toward improving such a high-
level objective” (Kaplan and Norton, 2001). As well as requiring an over-arching
objective, about which there may not be agreement, the technique does not offer
guidance about how to analyse formally the extent to which the multiple subsidiary
objectives support the main aim and how these might be inter-related. Either these
issues are overlooked or it is left to the judgement of those involved in the
development of an organisation’s balanced scorecard.
The final class of techniques contains those that attempt to combine multiple
objectives into a single index. However the creation of a single index is neither
straightforward nor uncontroversial. There are three fundamental challenges: what
6
relative value to place on each objective, how to guard against losing information, and
how to model the constraints that organisations face in pursuing each objective. Each
of these is discussed in turn.
Valuation of objectives
As regards relative values, consider organisations that pursue a set of objectives, Y,
comprising I separate objectives, Y={y1, y2, … , yI}. In order to create a single index
of overall organisational performance, it is necessary to combine the separate
components of the objective set in some way. This requires that individual objectives
are weighted such that
!"
=+++=I
i
II wywywywY1
22111,... . (1)
This raises the question of what values the weights, w, should reflect. Consider a
production possibility frontier (FF) for two objectives, y1 and y2, and two individuals
that place different values on these objectives, as reflected by the indifference curves
IC1IC1 and IC2IC2 in figure 1. The slopes of these curves at point of tangency with FF
reflect the individuals’ relative valuations of the two objectives and give rise to
different preferred points of production. It cannot be determined what constitutes the
optimal mix of objectives.
Figure 1 around here
One way to resolve this problem would be to impose a single set of weights, the
source of which might depend on the context of the analysis. Apart from the practical
and technical difficulties associated with the construction of such weights, applying a
7
single set of weights will not be appropriate in contexts where organisations enjoy
discretion over what emphasis to place on different objectives. For example, such an
approach would be contrary to policies aimed at decentralising decision-making or
encouraging organisations to be more responsive to the concerns of the populations or
client groups that they serve.
The extreme alternative is to allow weights to vary freely, so that they are specific to
each organisation. This is standard practice in applications of Data Envelopment
Analysis (DEA), a technique that has been developed to allow assessment of
organisations that pursue multiple objectives. DEA allows weights to vary such that
each organisation is seen in its best possible light. Some see this as an attractive
feature of the methods (Cooper et al., 2000) but the implication is that it limits
analytical discriminatory power, such that organisations are more difficult to
distinguish in terms of their relative overall performance. Returning to the example
illustrated in figure 1, with unconstrained weights, it will not be possible to
discriminate among any of the organisations located on the production possibility
frontier, FF.
But the inability of DEA to discriminate among organisations is more serious than
implied by figure 1. This is because the location of the production possibility frontier
is, of course, unknown. DEA overcomes this by constructing an ‘empirical’ frontier
that envelopes those observations furthest from the origin. Suppose that organisation
A is located on the true (unobserved) production possibility frontier, as in figure 2,
and that we wish to compare its performance to that of organisation B, also pursuing
these two objectives. Only if organisation B is located within the shaded area OCAD
8
will DEA consider that the performance of organisation A is superior to that of
organisation B. In this case, subject to there being but two organisations, the DEA
frontier appears as CAD. If organisation B is located anywhere else, its position will
be used to construct a different DEA frontier on which both organisations are located.
Thus, by definition, the performance of the two organisations will be deemed
equivalent1.
Figure 2 around here
Loss of transparency
The second, but related, challenge with the use of a single index is that important
information may be ‘lost’, and this may have implications both for external bodies
that wish to judge organisational performance and for the organisations themselves.
The use of a single index is appealing to external bodies because it promises to
simplify the assessment process. Superficially it appears much less demanding to
make judgments on the basis of a single index than to have to grapple with several
dimensions of performance. But this requires that contentious technical aspects
involved in constructing the single index are understood and agreed upon. This is a
particular issue when the index is to be used for regulatory purposes, when the choice
of weights is properly a political decision. In such circumstances, allowing weights to
vary freely or allowing analysts to select the weights can be considered an abrogation
of political responsibility (Smith and Street, 2005).
1 This problem cannot be resolved by using another popular technique, stochastic frontier analysis (SFA). Applications of SFA in a multiple objective context that do not explicitly address the weighting issue entail that objective weights simply reflect the sample average values. This is not an appealing implication, particularly given that, in undertaking the analysis, there is a pre-supposition of the presence of sub-optimal performance.
9
From an organisational perspective, if performance assessment is to engender
behavioral change, it is essential that the assessment technique provides clear
messages (Nutley and Smith, 1998). The use of a single index carries the risk that
important information will be difficult to access. It is not immediately apparent to
organisations how they perform on the specific performance dimensions that have
been amalgamated into the single index and, hence, where they should focus their
attention. Moreover, if the index is based on unconstrained weights, organisations will
not be able separately to identify sub-optimality in performance from differences in
the relative values placed upon objectives. For example, in applying DEA with
unconstrained weights, it is possible to place an organisation in its best possible light
simply by assigning a zero weight to those objectives on which it performs poorly.
Exogenous constraints
The third problem concerns how to account for exogenous constraints. A major
challenge in any application of performance assessment is to be able to attribute
differences in observed performance to differences in organisational effort. The
problem is that observed performance may be explained, at least partially, by factors
over which organisations may have little control, such as the environment in which
they operate or the population they serve. It is possible to condition performance on
such factors and to estimate their importance, most obviously through regression
analysis. But if performance across multiple objectives is assessed using a single
index, it is possible to discern only an average effect. It may be that each objective is
influenced by different constraining factors or that the influence of a specific
10
constraint varies according to each objective. More specific modelling of these
constraints facilitates a more accurate assessment of organisational performance.
The multivariate approach
Given these drawbacks, a middle way between the analysis of objectives in isolation
and the creation of a single index is recommended. This can be achieved by applying
multivariate or seemingly unrelated regression (SUR) techniques to consider I
objectives by joint estimation of a system of equations of the following form (Zellner,
1962):
0 1 1ik i ik i iky u!= + +x â i = 1,2,…,I; k = 1,2,…,K, (2)
where iky is the performance indicator for the i-th objective for the k-th organisation,
i0! is a coefficient,
ik1x is a 1 × qi vector of qi regressors specific to the objective i,
i1â is a qi × 1 vector of coefficients, and
iku is an error with ( ) 0
ikE u = . By stacking
the k organisations above each other, the multivariate model for the set of I objectives
may be written as:
0 1 1i i i i iy X u!= + +â i = 1,2,…,I, (3)
or:
011 11 11 1
022 12 12 2
1 10
0 ... 0
0 ... 0
... ... ... ......
0 0 ...I I I II
y X u
y X u
y X u
! !
! !
!!
" #" # " # " # " #$ %$ % $ % $ % $ %$ %$ % $ % $ % $ %= + +$ %$ % $ % $ % $ %$ %$ % $ % $ % $ %
& ' & ' & ' & '& '
, (4)
where iy ,
i0â and
iu are k × 1 vectors,
i1X is a k × qi matrix, and
i1â is a qi × 1
vector.
11
If the performance of an organisation k on two objectives i and p is related by
unobservable factors (e.g. managerial decisions), then iku would be correlated with
pku for i ≠ p. By estimating a multivariate model, we allow for such correlation. This
implies that:
( )'ik ph ipE u u != , if k = h and 0 otherwise, (5)
where k and h denote two different organisations.
In addition to explicit consideration of correlations between objectives, a major
advantage of the multivariate method for performance assessment in the context of
multiple objectives is that it does not require objectives to be weighted because
information on relative performance is provided specifically for each objective.
In the remainder of the paper we apply this technique to the analysis of health
authorities operating in the English National Health Service.
Data
Thirteen indicators of health authority performance are available (table 1). Variously,
these address health system objectives relating to NHS performance domains ‘health
outcomes’, ‘clinical quality’, ‘access’ and ‘efficiency’. All have been used in official
assessments of NHS performance but the set of indicators analysed here is not
intended as an exhaustive list of health system objectives (Hauck et al., 2003). Most
of the indicators are expressed as ratios, in which the actual value is standardised
against national values to take account of the composition of the local population in
12
terms of age, gender and case mix. For all indicators a higher value implies worse
performance.
Table 1 around here
Even after standardisation, differences in performance may arise from additional
variations in population characteristics. To account for this possibility, we condition
the performance indicators on various socioeconomic factors which have been shown
to be associated with the needs of the population and the utilisation of health care
(Carr-Hill et al., 1994). For the majority of the performance indicators, we condition
on factors used by the English Department of Health to estimate population
requirements for acute care resources. These variables are the proportion of those of
pensionable age living alone (OLDAL), the proportion of dependents in single carer
households (SCARE), and the proportion of the economically active population that is
unemployed (UNEMP).
Socioeconomic characteristics used by the Department of Health to estimate
population requirements for non-acute services are taken into account when
considering performance with respect to controlling the costs of psychiatry services
(Carr-Hill et al., 1994). These are the proportion of those of pensionable age living
alone (OLDAL), the proportion of persons in lone parent households (LOPAR), the
proportion of dependents in households with no carer (NCARE), and the proportion
of residents born in the New Commonwealth (NCMW). All explanatory variables are
centred around zero by subtracting the grand mean from the value for each ward.
13
The dataset is merged from several sources, including the 1991 census of population,
NHS administrative data, the 1990/91 Hospital Episode Statistics, a database of all
hospital inpatient episodes, and the 1990/91 Health Service Indicators package, which
includes summary indicators of health authority performance. The data are available
for each of the 4985 English electoral wards (although there are some missing data).
Wards have an average population of some 9,600 and are clustered geographically
within 186 health authorities covering populations of around 250,000. Our interest is
in assessing the contribution to observed performance made by these health
authorities. Descriptive data are provided in table 2.
Table 2 around here
Methods
Aggregate OLS models
For comparison purposes, we first estimate a conventional multiple regression model
for each performance indicator in which the population-weighted ward-level data are
aggregated to the relevant health authority. The health authority is thereby considered
the unit of analysis and the model takes the following form:
kkk uy ++=
110âx! , k = 1,2,…,K, (6)
where ky represents the population-weighted mean ward value of performance
indicator y across all wards in the k-th health authority, and k1
x represents a vector of
socio-economic variables in the k-th health authority. A positive slope parameter 1
!
14
can be interpreted as suggesting that worse socio-economic conditions are associated
with levels of performance worse than expected given the age and sex standardised
population characteristics of the health authority.
The term ku is the random error for the k-th health authority, assumed to have zero
mean and constant variance ( ). We can interpret ku as the parallel departure from
the mean regression line of the k-th health authority. Small estimated values of ku
indicate health authorities with close to average performance, after controlling for the
socio-economic situation of their wards. Large positive (negative) estimated values of
ku represent health authorities with an observed performance markedly worse (better)
than that predicted on the basis of demographic standardisation and socio-economic
conditions. If there is a large variation in ku , this suggests marked differences in
performance among health authorities. The model is estimated by ordinary least
squares using STATA 7.
This formulation ignores the hierarchical structure of the data by taking means
(weighted for population size) for ky and k1
x over all wards within a health authority.
There are a number of drawbacks to this (Rice and Leyland, 1996). First, we fail to
make full use of the available information. Second, there is a danger of committing
aggregation or ecological fallacy, in which a relationship found at the aggregate level
may not exist among the individuals (electoral wards in this case) from which the data
have been aggregated. For example, the average wait for hospital admission for
residents of a particular health authority may be no different from the national
average. However, this may disguise the possibility that waiting times depend on
15
where people live within this health authority, with people in one ward facing
substantially longer waits while those in another ward enjoying substantially better
access than average. Third, failure to account for clustering results in underestimates
of standard errors, which undermines significance tests. Fourth, the estimate of ku
captures random variation as well as managerial differences between health
authorities and is therefore an unsatisfactory measure of performance.
Multilevel (ML) models
Recognising the hierarchical structure of the data, we next analyse the data at electoral
ward level, these wards being merely geographical (rather than organisational)
constructs. Wards are clustered within health authorities and these wards are likely to
share closer similarities to one another than to wards elsewhere as they are similarly
affected by the decision making of the health authority to which they belong. Also,
they may share unobserved socio-demographic influences over and above those
included in the model. Importantly for estimation, this clustering implies that wards
cannot be considered independent observations. Rather their common (perhaps
unobservable) similarities imply correlation among wards within each health
authority, and this correlation invalidates classical OLS estimation because the iid
assumption is not met. To account for inter-dependence among wards, we define a
two-level random intercept model as follows:
jkkjkjk euy00110
+++= âx! , j = 1,2,…,J; k = 1,2,…,K, (7) where yjk represents performance indicator y in the j-th ward within the k-th district,
and x1jk represents the socio-economic conditions in the j-th ward in the k-th health
16
authority. The size of wards varies considerably (range 2,041 – 33,073 people) so
estimation incorporates population weights. In this context, weights are applied to the
error components (Hauck et al., 2003). The weights applied at ward-level are
calculated as ( )2/12/1/
!!= jkjkj navenw and those at health authority level as
( )2/12/1/
!!=kkknavenw where njk is the population in ward j in health authority k, nk is
the population in health authority k, and ave(.) denotes the average across the
quantities contained in parentheses.
The terms u0k and e0jk are error components such that u0k relates to the k-th health
authority (interpreted as the k-th health authority’s level of relative performance) and
e0jk is the random error for the j-th ward within the k-th health authority. Both error
components are assumed to have zero mean and constant variance ( 2
ou
! , 2
oe
! ). We
place the same interpretation upon u0k as we did on ku under the earlier formulation,
but the two estimates are notably different. Most importantly the greater the extent of
clustering, the greater will be the underestimation of standard errors of fixed part
parameters by OLS models (Rice and Leyland, 1996). The ML formulation may also
lead to a change in the relative performance of health authorities. This will depend on
the extent to which performance varies among health authorities and on the extent to
which health authorities are able to influence the performance indicator in question.
The intra-class correlation coefficient, ICCML, is used to assess the proportion of total
variance attributable to health authority influence, and is calculated as:
10,)( 1222
000<<+=
!
MLeuuML ICCICCjkkk
""" . (8) Larger values of ICCML are indicative of greater potential for health authorities to
influence the value of the relevant performance indicator (Hauck et al., 2003).
17
The computer package Mlwin BETA version 2.0 is used for estimation (Rasbash et
al., 2000).
Multivariate multilevel (MVML) model
The multilevel model described above is calibrated separately for each performance
indicator. This ignores the possibility that levels of achievement might be correlated
across indicators. This correlation may be positive, if progress against one indicator
simultaneously advances another, perhaps because good management promotes all-
round performance. But the correlation may be negative if trade-offs are involved,
such as when scarce resources that might be employed to achieve one objective are re-
directed to pursue another. Analysis that recognises the possibility of simultaneity in
the pursuit of multiple objectives uses additional information on the correlations to
generate superior measures of organisational achievement than a piecemeal analysis.
The multilevel framework can be extended to consider multiple outcomes simply by
recognising that the performance indicators themselves are clustered, in this context
within wards (Gilthorpe and Cunningham, 2000, Yang et al., 2002). This is a
multivariate model in a multilevel context. By considering the performance indicators
as the lowest tier in the data hierarchy, the possibility of within-ward and within-
health authority correlation among indicators can be assessed. Thus the multivariate
multilevel model (MVML model) is conceptualised as a three-level multilevel model,
in which the set of I performance indicators (level 1) are clustered within J wards
18
(level 2), which are themselves clustered within K health authorities (level 3). The
MVML model can be written as:
ijkikiijkiijk euy00110
+++= âx! , i = 1,2,…,I; j = 1,2,…,J; k = 1,2,…,K.. (9)
Thus, yijk is the i-th performance indicator for the j-th ward clustered within the k-th
health authority. The other parameters are analogous to their counterparts in the
aggregate OLS and ML models, except that we now consider an additional level i.
The error terms ik
u0
and ijke0
are both assumed to be normally distributed with zero
mean and constant variance ( 2
,iu! , 2
,ie! ) for each indicator. The ward level error ijke
0
represents the random error for performance indicator i in the j-th ward. We allow for
the possibility that performance indicators may be correlated within the same wards,
with ipepjkijk ee ,00 ),cov( != but assume that a performance indicator is independent
across wards in different health authorities, hence 0)cov( 00 =ighijk ee . This latter
assumption is restrictive in that neighbouring wards in different health authorities may
share unobserved characteristics that influence performance, but we lack the
geographical information to allow for this possibility.
The health authority effect is captured by ik
u0
. The covariance for the i-th and p-th
performance indicators within a health authority k is given by:
ipupkik uu ,00 ),cov( != . (10)
These estimates of covariance can be used to calculate the degree of correlation rip
between performance indicator i and p:
19
2
,
2
,
,
puiu
ipu
ipr
!!
!
+= . (11)
If the correlation is positive, this implies that a health authority that has better than
average performance for indicator i also has above average performance for indicator
p. A negative correlation implies that above average performance for the one indicator
coincides with poorer performance for the other. This correlation is interpreted as
being due to unobservable influences on performance, such as the managerial
competency of the health authority or the shared influence of environmental
conditions over and above those factors that we have controlled for.
Consistent with the ML models, we estimate the intra-class correlation coefficient as
10,)( 1222
000<<+=
!
MVMLeuuMVML ICCICCijkikik
""" . (12) If there are correlations among performance indicators, the residuals from the ML
models, k
u0 , and the residuals from the MVML model,
iku0
, may differ for the same
performance indicator i.
We conduct a Likelihood Ratio Test to determine whether the correlations among
residuals are jointly zero or not. The ML models are the restricted models because
they impose 78 ( )2/)1313( 2!= zero correlation assumptions on the residuals. The
test statistic is given as:
IiLLFLLF
I
i
MLMVML,...,1,2
1
=!!"
#$$%
&!"
#$%
&'= (
=
) (13)
20
where LLFMVML is the log-likelihood function for the multivariate multilevel model,
and LLFML is the log-likelihood function for a multilevel model applied to a single
performance indicator. Asymptotically, ! has a chi-square distribution. A significant
test statistic indicates that estimation as a MVML model is preferable to separate
estimation of a set of ML models, and implies the presence of correlation among
performance indicators.
Accounting for outliers
In applying the OLS, ML and MVML specifications we test for the presence of highly
atypical health authorities using a procedure based on the interquartile range
suggested by Hamilton (Hamilton, 1992). The procedure defines ‘severe’ outliers as
IQRQu 3)25(. !< or , where Q(25) and Q(75) are the 25th and
75th quartile of the distribution of u., and IQR is the interquartile range. Severe
outliers comprise about 0.0002% of the normal population. The presence of a severe
outlier in samples of n<300 is considered sufficient evidence to reject normality at a
5% significance level (Hamilton, 1992). The outlier test is applied for each
performance indicator, with an observation defined as an outlier for the relevant
indicator if it appears as such under at least one of the three specifications. Outliers
are identified for only four of the performance indicators. For EMOLD, 3 health
authorities are defined as upper outliers (‘very bad’ performers); for DEATHS, 11
health authorities are lower outliers (‘very good’ performers, reporting zero deaths
after surgery); for WTRADIO, 5 health authorities are lower outliers (‘very good’
performers with very short waiting times); and for ELECTEPS, 6 health authorities
21
are lower outliers (‘very good’ performers with a high number of elective episodes).
Rather than dropping these observations, it is more appropriate to capture their
atypical influence by including a dummy variable (=1 if HA is outlier, =0 if not) that
is specific to each health authority and to each of the four performance indicators
affected (Langford and Lewis, 1998). Inclusion of the dummy variables conditions
residual variances on variations in performance which are non-normal, so that
variations in health authority effects for the affected authorities can be interpreted as
variations in performance. We adopted this procedure in estimating the OLS and ML
models, but the MVML model failed to iterate successfully with the addition of 24
dummy variables. To overcome this, a single dummy variable is applied for all the
outlying health authorities for each indicator in order to capture their average effect.
The ML model estimates are assumed to capture the individual effects for these
outliers.
Results
Model parameters
Estimation results are shown in table 3. As examples, we select one indicator from
each of the four performance domains – SMR064, EMOLD, WTSURG, and
DCRATE.2 The aim of the models is to control for influences upon performance that
are considered exogenous to health authority control rather than to explain variation in
each performance indicator. Hence, primary interest is in the value of the residuals
and not in the parameter estimates themselves. Nevertheless some features are worth
2 Results for the other indicators are available from the authors on request.
22
noting. First, parameter estimates are in close agreement for the ML and MVML
models, but differences are evident between these models and the aggregate OLS
models. These differences reflect errors in estimation that arise from a failure to
account for the hierarchical nature of the data, with the relationships estimated by the
aggregate model being contaminated by ward-level effects.
Table 3 around here
Second, for the majority of performance indicators, parameter estimates are
significantly positive, implying that areas with worse than average socio-economic
conditions are likely to have levels of achievement worse than that expected given the
age-sex standardised characteristics of the population. This suggests that age-sex
standardisation alone is insufficient to capture the differences in the characteristics of
ward populations.
Third, not only does the size but sometimes the direction of influence varies according
to the performance indicator. For example, as would be expected, health authorities
with a higher proportion of elderly people living alone (OLDAL) are likely to have a
higher proportion of elderly people admitted to hospital as emergencies (EMOLD).
But such health authorities are likely to have lower waiting times (WTSURG),
perhaps because the elderly are less likely to be placed on surgical waiting lists.
Correlations across performance indicators
23
The Likelihood Ratio test comparing the ML and MVML models clearly rejects the
null hypothesis of jointly zero correlations among the residuals (λdf=78 = 9,495,
p=0.000). This indicates that the MVML model improves inference by allowing
explicitly for correlations among the performance indicators. The correlation
coefficients for the health authority effects across the various indicators are presented
in table 4. Coefficients with an asterix are significant at the 5% level.
Table 4 around here
We find statistically significant positive correlations among the health outcome
indicators, SMR064, SMR6574 and SIR074. These correlations imply that in an area
with above-average mortality rates for ages 0-64, mortality rates for ages 65-74 and
rates of chronic illness will be above average also. There is a statistically significant
positive correlation (rip = 0.41) between the two clinical quality indicators, DEATHS
and EMOLD, implying that areas with a higher proportion of emergency admissions
also report more deaths following hospital surgery. There is an almost perfect
correlation between WTSURG and WTLONG (rip = 0.95), suggesting that one of
these waiting times indicators is redundant.
These two measures of waiting time have a significant negative correlation with the
health outcome measures (from rip = -0.25 to rip = -0.16), which might be indicative of
trade-offs between these broad types of objectives: efforts directed at reducing
waiting times may have adverse consequences for these measures of health outcome.
There is also a negative correlation between health outcomes and the number of
elective episodes (from rip = -032 to rip = -0.19), which means that, in health
24
authorities with higher rates of illness and mortality, more elective procedures are
undertaken. In contrast, there is a significant positive correlation between the health
measures and the indicator measuring accessibility to GPs (from rip = 0.47 to rip =
0.48). This suggests that in areas with above average illness and mortality rates people
experience greater difficulties in accessing GP services.
Variations in performance between health authorities
Taking account of the hierarchical nature of the data and of the possibility of
correlation among performance indicators may have an impact on the extent to which
variations in performance can be attributed to health authorities. Table 5 provides
details of the variance components for the three models. For the ML and MVML
models the intra-class correlation coefficient ICC provides an indication of the extent
to which variations in performance, after conditioning upon socio-economic factors,
are attributable to health authorities.
This varies according to the performance indicator. It appears that health authorities
have a substantial role in determining performance as measured by waiting times and
day case rates. Around 70% of small area variation in performance against these
indicators is being attributed to health authorities, after controlling for variation due to
differences in socioeconomic conditions. In contrast, the impact of health authorities
on mortality rates is considerably less, explaining only 15% of the variation in
mortality rates for the under 65s and 19% of that for those in the 65-74 age group.
These estimates of health authority effects are little changed by the move to the
MVML formulation.
25
Table 5 around here
It is important to bear in mind that measurement error may result in performance
variations being over- or underestimated and wrongly attributed to the health
authority level. For example, systematic variations in data collection, actions of other
geographically defined agencies, or the influence of national public health policies
may lead to a mistaken attribution of the health authority effect as being due solely to
differences in health authority performance (Hauck et al., 2003). By a similar token,
variations across wards within each health authority may not be truly random. For
example, there may be systematic differences in the way that patients from different
wards within their shared health authority are treated by those who make decisions
that effect them, whether these be general practitioners, hospital staff, or local
managers. Such behaviour is most likely to be captured by the ward-level effect here,
though it would be more desirable to attribute it to the health authority. This is a
considerable challenge because such behaviour would be difficult to measure
accurately even were discrimination to be reasonably overt.
Sensitivity analysis of health authority effects
Taking account of the hierarchical nature of the data and of the possibility of
correlation among performance indicators may also have an impact on the estimates
for particular health authorities. The sensitivity of the relative performance of each
health authority to specification decisions can be illustrated graphically. Figures 3 to 6
plot the health authority effect estimated by the three models for SMR064, EMOLD,
26
WTSURG, and DCRATE. The health authority effect for the aggregate OLS model,
ku , is plotted on the diagonal from the ‘bottom left corner’ (best performance) to the
‘top right corner’ (worst performance) of each figure. The health authority effects
deriving from the ML (k
u0
) and MVML (ik
u0
) models are indicated respectively by a
circle and a triangle. The vertical lines connecting these points depict the range in
values for each individual health authority, with longer lines indicating greater
sensitivity in individual values to the choice of model specification.
The sensitivity of health authority effects to which estimation method is employed is
greater for some performance indicators and health authorities than for others. For
example, estimates for SMR064 (figure 3) seem comparably stable irrespective of the
estimation technique. Estimates for DCRATE (figure 6) do not differ much for health
authorities with average performance, but they do differ for the very good and the
very bad performers. WTSURG (figure 5) shows relatively high volatility, with this
being apparent across the entire series. For EMOLD (figure 4) the choice of model
specification has substantial implications for a handful of health authorities, which are
not concentrated at any particular location along the series.
The figures illustrate that most of the variation in relative performance stems from a
failure to consider the hierarchical nature of these data. The ML and MVML estimates
are mostly in close agreement, suggesting that correlations across performance
indicators do not have a great impact on the estimates of health authority performance
for our dataset. This is despite the fact that the Likelihood Ratio Test indicates that the
MVML model improves inference, and that we find strong and significant
correlations between some performance indicators.
27
Conclusion
The analysis presented above is used to illustrate an approach to performance
assessment in the presence of multiple objectives. This is deemed preferable to
standard analytical approaches in which objectives either are amalgamated in some
way to create a single index of overall achievement or are considered in isolation. A
single index requires that objectives are weighted in some way so that they can be
aggregated. The relative value to be placed on the objectives of public sector
organisations is a political issue requiring explicit consideration and we contend that
this matter should not be subsumed as a technical part of the analytical process. By
analysing objectives individually, weighting is unnecessary. However, separate
analysis ignores the possibility that objectives may be related. This leads to a loss of
information and potentially biased measures of performance.
The multivariate approach described in this paper overcomes the main weaknesses of
these two standard approaches. It avoids the need to generate a single index and to
weigh objectives, but allows for the possibility that objectives are correlated with each
other. It provides policy makers and managers with information that is easier to
interpret and to act upon than on a single index, because estimates of performance are
specific to each objective. The multivariate approach also improves the quality of the
statistical analysis because, unlike more conventional approaches, it exploits
information on the correlation between objectives. This provides insight into the
potential trade-offs or synergies between indicators.
28
The multivariate approach has further advantages over the use of a single index. The
choice of which independent variables to include can be specific to each objective. In
the analysis conducted here this flexibility is little exploited in that, for most of the
equations, a standard set of regressors is applied. Even so, we are able to identify the
relative influence of these variables for each objective, rather than an average effect.
Aside from being more informative, this is important because the direction and size of
influence may differ according to the objective under consideration.
In addition to the multivariate nature of the analysis, the data considered in this paper
are hierarchical. Serious misrepresentations of relative performance may arise from
estimating aggregate OLS models in which objectives are considered in isolation
because of the failure to account for hierarchical data structures or correlations across
objectives. The extent of this misrepresentation varies according to the objective, but
two general observations can be made. First, it will be greater for those objectives
over which organisations have limited influence, simply because OLS estimates will
be more contaminated by random influences. Second, misrepresentation will be less
for those objectives that are highly correlated in a positive direction with other
measures of performance. In such cases, joint analysis merely reinforces the
assessment of separate models. The sensitivity of estimates of relative performance
will be more substantial in the presence of negative correlations among objectives
because achievement against one measure may be counter-acted by lesser
achievement against another. This is likely to be an important source of error when
analysing organisations with multiple conflicting objectives. We conclude, therefore,
that consideration should be given to analysing organisational performance on small
29
area or individual level data and by considering organisational objectives
simultaneously.
There are limitations with the analysis presented in this paper, the most obvious being
that we have been restricted by the cross-sectional nature of our data. Such snapshots
provide only partial insights about performance in contexts where there are likely to
be important dynamic effects. Current performance is likely to be partially attributable
to past efforts, and some degree of current effort may be directed toward future
attainment. The lags and lead times are likely to vary according to each objective. For
instance, efforts to reduce hospital waiting times may realise an effect more rapidly
than health promotion efforts designed to reduce mortality rates. The development of
a truly dynamic model of performance assessment that is able to recognise both past
inheritances and future investments is a major challenge for future research.
30
References
Carr-Hill, R.A., Hardman, G., Martin, S., Peacock, S., Sheldon, T.A., Smith, P.C., 1994, A formula for distributing NHS revenues based on small area use of hospital beds (Centre for Health Economics, University of York, York). Cooper, W.W., Seiford, L.M., Tone, K., 2000, Data envelopment analysis: a comprehensive text with models, applications, references and DEA-solver software (Kluwer Academic Publishers, Boston). Gilthorpe, M.S., Cunningham, S.J., 2000, The application of multilevel, multivariate modelling to orthodontic research data, Community Dental Health 17, 236-242. Hamilton, L.C., 1992, Resistant normality check and outlier determination, STATA Technical Bulletin Reprints 1, 86-90. Hauck, K., Rice, N., Smith, P.C., 2003, The influence of health care organisations on health system performance, Journal of Health Services Research and Policy 8, 68-74. Kaplan, R.S., Norton, D.P., 2001, Transforming the Balanced Scorecard from performance measurement to strategic management, Accounting Horizons 15, 87-104. Langford, I., Lewis, T., 1998, Outliers in multilevel models, Journal of the Royal Statistical Society: Series A (Statistics in Society) 161, 121-160. Nutley, S., Smith, P.C., 1998, League tables for performance improvement in health care, Journal of Health Services Research and Policy 3, 50-57. Rasbash, J., Browne, W., Goldstein, H., Yang, M., Plewis, I., Healy, M., Woodhouse, G., Draper, D., Langford, I., Lewis, T., 2000, A User's Guide to MLwiN, V2.1 (Institute of Education, London). Rice, N., Leyland, A., 1996, Multilevel models: applications to health data, Journal of Health Services Research and Policy 1, 154-164. Smith, P.C., Street, A., 2005, Measuring the efficiency of public services: the limits of analysis, Journal of the Royal Statistical Society: Series A (Statistics in Society) 168, 401-417. Yang, M., Goldstein, H., Browne, W., Woodhouse, G., 2002, Multivariate multilevel analyses of examination results, Journal of the Royal Statistical Society: Series A (Statistics in Society) 165, 137-146. Zellner, A., 1962, An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias, Journal of the American Statistical Association 57, 500-579.
31
Table 1: Performance indicators and socio-economic variables
PERFORMANCE INDICATORS DESCRIPTION
HEALTH OUTCOME
SMR064 Standardised mortality ratio for ages 0-64:Ratio of observed deaths from all causes in an area to the expected equivalent given the local age/sex profile and national averages
SMR674 Standardised mortality ratio for ages 65-74:Ratio of observed deaths from all causes in an area to the expected equivalent given the local age/sex profile and national averages
SIR074 Limiting long standing illness for ages 0-74:Ratio of observed number of people reporting limiting illness in an area to the expected equivalent given the local age/sex profile and national averages
CLINICAL QUALITY
EMOLD Emergency admissions of elderly people: Ratio of the rate of over 65 emergency admissions originating from an area to the expected given the age, sex and specialty of a patient and national averages
DEATHS Deaths following hospital surgery: Ratio of 30 day perioperative mortality after elective and non-elective surgery to the expected equivalent given the age, sex and case severity of a patient
ACCESS
WTSURG Waiting time for routine surgery: Ratio of actual waiting time in days for routine surgery to the expected equivalent given the age, sex and specialty of a patient and national averages
WTRADIO Waiting time for radiotherapy: Ratio of actual waiting time in days for radiotherapy to the expected equivalent given the age, sex and specialty of a patient and national averages
WTLONG Percentage of those on waiting list waiting for 12 months or more: Proportion of elective surgery admissions waiting for more than one year standardised for patient characteristics
GPACCS Accessibility to general practitioners (GPs): Indicator of relative accessibility given the supply of GPs, the distance to surgeries and the competition from local populations
ELECTEPS Number of elective surgery episodes: Ratio of standard surgery procedures originating from an area to the expected equivalent given the age, sex and specialty of a patient
EFFICIENCY
DCRATE Day case rate: Proportion of elective episodes in routine surgery treated as day cases standardised for patient characteristics
MATCOST Maternity costs: Ratio of specialty specific fixed and variable costs for episodes to the expected equivalent given national averages
PSYCOST Psychiatry costs: Ratio of specialty specific fixed and variable costs for episodes to the expected equivalent given the age and sex of a patient and national averages
SOCIO-ECONOMIC VARIABLES
OLDAL Proportion of pensionable age living alone
SCARE Proportion of dependents in single carer households
UNEMP Proportion of economically active unemployed
LOPAR Proportion of households headed by a lone parent
NCARE Proportion of dependents with no carer
NCMW Proportion of persons born in the New Commonwealth
32
Table 2: Descriptive statistics
Variable Obs Mean Std. Dev. Min Max
Population details
Ward population 4972 9644 3614 2041 33073 Health authority population 186 257805 113618 86777 891745
Number of wards within a health authority
4985 32.348 15 6 95
Performance Indicators
SMR064 4972 0.997 0.286 0.173 2.453 SMR674 4972 0.997 0.231 0.229 2.424 SIR074 4972 0.990 0.210 0.416 2.467 EMOLD 4967 1.02 0.362 0.055 9.849 DEATHS 4967 0.997 0.422 0.000 258.650 WTSURG 4461 1.011 0.206 0.443 1.949 WTRADIO 4967 0.933 0.105 0.001 1.000 WTLONG 4967 7.376 2.988 0.792 23.913 GPACCS 4972 0.528 0.128 0.162 0.969 ELECTEPS 4972 1.000 0.432 0.000 3.137 DCRATE 4461 0.383 0.083 0.138 0.618 MATCOST 4967 0.981 0.387 0.000 5.595 PSYCOST 4967 0.994 0.521 0.065 6.190
Socio-economic Variables
OLDAL 4985 0.000 0.170 -0.766 0.673 SCARE 4985 0.000 0.297 -1.337 0.923 UNEMP 4985 0.000 0.500 -1.217 1.704 LOPAR 4985 0.000 0.559 -1.945 1.639 NCARE 4985 0.000 0.360 -2.109 1.150 NCMW 4985 0.000 1.122 -3.287 3.472 PSICK 4985 0.000 0.506 -2.007 1.885
33
Table 3: Coefficient estimates (selected indicators only)
Coefficients with an asterix* are significant at the 5% level Standard errors in parentheses
OLS model ML model MVML model
SMR064 constant 0.998* (0.006) 0.995* (0.006) 0.996* (0.006) oldal 0.477* (0.092) 0.237* (0.019) 0.257* (0.018) scare 0.339* (0.064) 0.131* (0.010) 0.151* (0.016) unemp 0.162* (0.033) 0.333* (0.010) 0.306* (0.010) EMOLD constant 1.007* (0.013) 1.008* (0.013) 1.011* (0.013) oldal 0.555* (0.207) 0.161* (0.029) 0.208* (0.029) scare -0.315* (0.149) 0.046 (0.026) 0.122* (0.025) unemp 0.381* (0.079) 0.214* (0.016) 0.140* (0.015) e_outlier 0.815* (0.105) 0.886* (0.108) 0.677* (0.095) WTSURG constant 1.012* (0.012) 1.004* (0.014) 0.996* (0.013) oldal -0.858* (0.205) -0.035* (0.012) -0.041* (0.012) scare 0.137 (0.139) 0.017 (0.011) 0.015 (0.010) unemp -0.091 (0.071) 0.035* (0.007) 0.035* (0.006) DCRATE constant -0.390* (0.006) -0.383* (0.005) -0.385* (0.005) oldal 0.119 (0.094) 0.038* (0.005) 0.037* (0.005) scare -0.012 (0.064) 0.013* (0.004) 0.013* (0.004) unemp -0.022 (0.033) -0.007* (0.003) -0.006* (0.003) Table 4: Correlation of health authority effects from the multivariate multilevel model Correlation coefficients with an asterix* are significant at the 5% level
SMR064 SMR674 SIR074 EMOLD DEATHS WTSURG WTRADIO WTLONG GPACCS ELECTEPS DCRATE MATCOST
SMR674 0.73*
SIR074 0.62* 0.91*
EMOLD 0.00 0.15* 0.05
DEATHS 0.17* 0.30* 0.26* 0.41*
WTSURG -0.16* -0.22* -0.20* -0.13* -0.03
WTRADIO 0.00 0.00 0.16 0.00 0.00 -0.10
WTLONG -0.12 -0.25* -0.21* -0.13 -0.07 0.95* -0.13
GPACCS 0.26 0.47* 0.48* 0.21* 0.21* 0.10 0.00 0.05
ELECTEPS -0.15 -0.32* -0.25* -0.13 -0.13 0.32* -0.07 0.34* -0.07
DCRATE 0.00 -0.18 -0.12 0.00 0.00 0.40* -0.26 0.38* 0.00 0.35*
MATCOST 0.10 0.00 0.10 0.02 -0.04 0.14 0.26* 0.12 0.07 -0.16* -0.15
PSYCOST 0.16 0.15 0.27* 0.09 -0.20* 0.13 0.14* 0.11 0.28* -0.09 -0.05 0.21*
34
Table 5: Variances and intra-class correlation coefficients
OLS model ML model MVML model
2
u!
2
ou
! 22
00eu
!! + ML
ICC 2
0 iu
! 22
00 iieu
!! + MVML
ICC
SMR064 0.006 0.005 0.034 0.15 0.005 0.034 0.15 SMR674 0.008 0.006 0.031 0.19 0.006 0.031 0.19 SIR074 0.013 0.013 0.024 0.54 0.013 0.024 0.54 EMOLD 0.029 0.027 0.097 0.28 0.029 0.099 0.29 DEATHS 0.028 0.028 0.104 0.27 0.029 0.105 0.28 WTSURG 0.025 0.031 0.041 0.76 0.032 0.043 0.74 WTRADIO 0.003 0.005 0.009 0.56 0.003 0.007 0.43 WTLONG 5.248 6.097 8.778 0.69 6.112 8.793 0.70 GPACCS 0.003 0.003 0.010 0.30 0.003 0.010 0.30 ELECTEPS 0.082 0.078 0.187 0.42 0.078 0.187 0.42 DCRATE 0.005 0.005 0.007 0.71 0.005 0.007 0.71 MATCOST 0.087 0.077 0.137 0.56 0.076 0.136 0.56 PSYCOST 0.061 0.071 0.207 0.34 0.070 0.206 0.34
35
Figure 1: The production possibility frontier: different preferences lead to different weights
Figure 2: Performance comparisons across multiple dimensions
y1
y2
IC2
IC1
FF
y1
y2
FF
A C
D 0
36
Figure 3: Sensitivity analysis of health authority effects for ‘Standardized mortality
ratio (ages 0-64)’
Figure 4: Sensitivity analysis of health authority effects for ‘Emergency admissions of
elderly people’
37
Figure 5: Sensitivity analysis of health authority effects for ‘Waiting time for routine
surgery’
Figure 6: Sensitivity analysis of health authority effects for ‘Day case rate’