37
1 Performance Assessment in the Context of Multiple Objectives: A Multivariate Multilevel Analysis Katharina Hauck a , Andrew Street b * a Centre for Health Economics b Centre for Health Economics Monash University University of York Clayton, Victoria 3800 YORK YO10 5DD Australia England Katharina Hauck: [email protected] Andrew Street: [email protected] * Corresponding author Acknowledgements The authors are grateful to the English Department of Health for financial support. Earlier versions of this paper were presented at the 1 st Franco-British Meeting in Health Economics, Paris January 2004 and the 5 th iHEA World Congress, Barcelona July 2005. We are grateful for comments from Tony Culyer, Yves-Antoine Flor, Giovanni Forchini, George Leckie, Simon Peck, Nigel Rice, Peter C Smith, participants in the York Seminars in Health Econometrics, and the journal’s referees. The views expressed are those of the authors only.

Performance Assessment in the Context of Multiple

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance Assessment in the Context of Multiple

1

Performance Assessment in the Context of Multiple

Objectives: A Multivariate Multilevel Analysis

Katharina Haucka, Andrew Streetb*

aCentre for Health Economics bCentre for Health Economics Monash University University of York Clayton, Victoria 3800 YORK YO10 5DD Australia England

Katharina Hauck: [email protected] Andrew Street: [email protected]

* Corresponding author

Acknowledgements

The authors are grateful to the English Department of Health for financial support. Earlier versions of this paper were presented at the 1st Franco-British Meeting in Health Economics, Paris January 2004 and the 5th iHEA World Congress, Barcelona July 2005. We are grateful for comments from Tony Culyer, Yves-Antoine Flor, Giovanni Forchini, George Leckie, Simon Peck, Nigel Rice, Peter C Smith, participants in the York Seminars in Health Econometrics, and the journal’s referees. The views expressed are those of the authors only.

Page 2: Performance Assessment in the Context of Multiple

2

Abstract

The pursuit of multiple objectives by public sector organisations makes it difficult to

assess and compare their performance. Considering objectives in isolation ignores the

possibility of correlations between objectives, and a single index of performance

requires subjective judgements to be made about the relative value of each objective.

An alternative approach is to estimate a multivariate system of equations in which

objectives are analysed individually but correlations across objectives are considered

explicitly. We analyse the performance of English Health Authorities against thirteen

objectives using hierarchical data for electoral wards that are nested within Health

Authorities. We find evidence of correlation across objectives, suggesting that some

are complementary and others subject to trade-off. The estimates generated when

assessing performance with multivariate multilevel models as compared to ordinary

least squares or multilevel models differ, with the magnitudes varying by objective

and Health Authority.

JEL Classification: C33; D21; I18; L31

Keywords: Multilevel multivariate models; Performance analysis; Public sector

organisations.

Page 3: Performance Assessment in the Context of Multiple

3

Introduction

The pursuit of multiple objectives by public sector organisations makes it

difficult to assess and compare their performance. Commonly performance

assessment of such organisations is a relatively unsophisticated process, consisting of

measurement against a set of objectives that may or may not be aggregated into a

single index of organisational performance. This type of approach neglects the

possibility that organisational achievement may be correlated across objectives. This

correlation may be positive, if progress against one indicator simultaneously advances

another, perhaps because good management promotes all-round performance. But the

correlation may be negative if trade-offs are involved, such as when resources have to

be diverted from one activity in order to meet some other objective.

In this paper we consider methods designed to take account of inter-

relationships among objectives when assessing organisational performance. We

compare techniques to assess the performance of health authorities in the English

National Health Service against a variety of objectives (performance indicators). Our

data are reported at electoral ward level, with wards being clustered within health

authorities, hence the data have a hierarchical (or multilevel) structure.

We build up our methodology in three stages. First, we ignore the hierarchical

structure of the data and the possibility that correlations may exist across objectives.

Treating the health authority as the unit of analysis, we estimate a multiple regression

model separately for each objective, allowing for the possibility that variations in

Page 4: Performance Assessment in the Context of Multiple

4

performance might be attributable to differences in population demography and socio-

economic conditions. Levels of observed performance above those predicted by the

regression model are interpreted as indicative of above average performance by the

health authority. Second, recognising the ‘wards within health authorities’

hierarchical structure to the data, we separately identify a health authority effect from

a ward-level effect on performance for each objective. Wards are merely geographical

constructs, groupings of the population for electoral purposes. They are not decision-

making organisations. Thus, the ward effect is interpreted as random variation and the

health authority effect as indicative of health authority performance. Finally,

correlations across objectives are considered by estimating a multivariate, multilevel

model that treats each objective as part of a system of equations.

The paper is structured as follows. In the next section we consider the

drawbacks of standard approaches to performance assessment in the presence of

multiple objectives. We then introduce the multivariate approach to performance

assessment. We describe the data in the following section and then detail the models

to be estimated. Estimation results are then presented, together with a comparison of

the results provided by the different estimation procedures in terms of the impact

upon model parameters and estimates of the relative performance of health

authorities. Concluding comments are offered in the final section.

Conventional approaches

Page 5: Performance Assessment in the Context of Multiple

5

In contexts where organisations pursue multiple objectives it is not a simple matter to

assess and make comparative statements about overall performance. Standard

approaches can be classified into three groups.

First, objectives can be considered in isolation, whether individually or as a suite of

performance indicators. As well as there being a danger that only a partial view of

performance is provided, this approach treats objectives as independent by ignoring

any possibility that they might be related. This leads to a loss of information and

potentially biased estimates of performance.

The second type of approach involves imposing a single over-riding objective on

organisations. An example is the balanced scorecard which, when applied to nonprofit

and government agencies, requires “an over-arching objective at the top of their

scorecard that represents their long-term objective” (Kaplan and Norton, 2001). The

multiple sub-ordinate objectives are then “oriented toward improving such a high-

level objective” (Kaplan and Norton, 2001). As well as requiring an over-arching

objective, about which there may not be agreement, the technique does not offer

guidance about how to analyse formally the extent to which the multiple subsidiary

objectives support the main aim and how these might be inter-related. Either these

issues are overlooked or it is left to the judgement of those involved in the

development of an organisation’s balanced scorecard.

The final class of techniques contains those that attempt to combine multiple

objectives into a single index. However the creation of a single index is neither

straightforward nor uncontroversial. There are three fundamental challenges: what

Page 6: Performance Assessment in the Context of Multiple

6

relative value to place on each objective, how to guard against losing information, and

how to model the constraints that organisations face in pursuing each objective. Each

of these is discussed in turn.

Valuation of objectives

As regards relative values, consider organisations that pursue a set of objectives, Y,

comprising I separate objectives, Y={y1, y2, … , yI}. In order to create a single index

of overall organisational performance, it is necessary to combine the separate

components of the objective set in some way. This requires that individual objectives

are weighted such that

!"

=+++=I

i

II wywywywY1

22111,... . (1)

This raises the question of what values the weights, w, should reflect. Consider a

production possibility frontier (FF) for two objectives, y1 and y2, and two individuals

that place different values on these objectives, as reflected by the indifference curves

IC1IC1 and IC2IC2 in figure 1. The slopes of these curves at point of tangency with FF

reflect the individuals’ relative valuations of the two objectives and give rise to

different preferred points of production. It cannot be determined what constitutes the

optimal mix of objectives.

Figure 1 around here

One way to resolve this problem would be to impose a single set of weights, the

source of which might depend on the context of the analysis. Apart from the practical

and technical difficulties associated with the construction of such weights, applying a

Page 7: Performance Assessment in the Context of Multiple

7

single set of weights will not be appropriate in contexts where organisations enjoy

discretion over what emphasis to place on different objectives. For example, such an

approach would be contrary to policies aimed at decentralising decision-making or

encouraging organisations to be more responsive to the concerns of the populations or

client groups that they serve.

The extreme alternative is to allow weights to vary freely, so that they are specific to

each organisation. This is standard practice in applications of Data Envelopment

Analysis (DEA), a technique that has been developed to allow assessment of

organisations that pursue multiple objectives. DEA allows weights to vary such that

each organisation is seen in its best possible light. Some see this as an attractive

feature of the methods (Cooper et al., 2000) but the implication is that it limits

analytical discriminatory power, such that organisations are more difficult to

distinguish in terms of their relative overall performance. Returning to the example

illustrated in figure 1, with unconstrained weights, it will not be possible to

discriminate among any of the organisations located on the production possibility

frontier, FF.

But the inability of DEA to discriminate among organisations is more serious than

implied by figure 1. This is because the location of the production possibility frontier

is, of course, unknown. DEA overcomes this by constructing an ‘empirical’ frontier

that envelopes those observations furthest from the origin. Suppose that organisation

A is located on the true (unobserved) production possibility frontier, as in figure 2,

and that we wish to compare its performance to that of organisation B, also pursuing

these two objectives. Only if organisation B is located within the shaded area OCAD

Page 8: Performance Assessment in the Context of Multiple

8

will DEA consider that the performance of organisation A is superior to that of

organisation B. In this case, subject to there being but two organisations, the DEA

frontier appears as CAD. If organisation B is located anywhere else, its position will

be used to construct a different DEA frontier on which both organisations are located.

Thus, by definition, the performance of the two organisations will be deemed

equivalent1.

Figure 2 around here

Loss of transparency

The second, but related, challenge with the use of a single index is that important

information may be ‘lost’, and this may have implications both for external bodies

that wish to judge organisational performance and for the organisations themselves.

The use of a single index is appealing to external bodies because it promises to

simplify the assessment process. Superficially it appears much less demanding to

make judgments on the basis of a single index than to have to grapple with several

dimensions of performance. But this requires that contentious technical aspects

involved in constructing the single index are understood and agreed upon. This is a

particular issue when the index is to be used for regulatory purposes, when the choice

of weights is properly a political decision. In such circumstances, allowing weights to

vary freely or allowing analysts to select the weights can be considered an abrogation

of political responsibility (Smith and Street, 2005).

1 This problem cannot be resolved by using another popular technique, stochastic frontier analysis (SFA). Applications of SFA in a multiple objective context that do not explicitly address the weighting issue entail that objective weights simply reflect the sample average values. This is not an appealing implication, particularly given that, in undertaking the analysis, there is a pre-supposition of the presence of sub-optimal performance.

Page 9: Performance Assessment in the Context of Multiple

9

From an organisational perspective, if performance assessment is to engender

behavioral change, it is essential that the assessment technique provides clear

messages (Nutley and Smith, 1998). The use of a single index carries the risk that

important information will be difficult to access. It is not immediately apparent to

organisations how they perform on the specific performance dimensions that have

been amalgamated into the single index and, hence, where they should focus their

attention. Moreover, if the index is based on unconstrained weights, organisations will

not be able separately to identify sub-optimality in performance from differences in

the relative values placed upon objectives. For example, in applying DEA with

unconstrained weights, it is possible to place an organisation in its best possible light

simply by assigning a zero weight to those objectives on which it performs poorly.

Exogenous constraints

The third problem concerns how to account for exogenous constraints. A major

challenge in any application of performance assessment is to be able to attribute

differences in observed performance to differences in organisational effort. The

problem is that observed performance may be explained, at least partially, by factors

over which organisations may have little control, such as the environment in which

they operate or the population they serve. It is possible to condition performance on

such factors and to estimate their importance, most obviously through regression

analysis. But if performance across multiple objectives is assessed using a single

index, it is possible to discern only an average effect. It may be that each objective is

influenced by different constraining factors or that the influence of a specific

Page 10: Performance Assessment in the Context of Multiple

10

constraint varies according to each objective. More specific modelling of these

constraints facilitates a more accurate assessment of organisational performance.

The multivariate approach

Given these drawbacks, a middle way between the analysis of objectives in isolation

and the creation of a single index is recommended. This can be achieved by applying

multivariate or seemingly unrelated regression (SUR) techniques to consider I

objectives by joint estimation of a system of equations of the following form (Zellner,

1962):

0 1 1ik i ik i iky u!= + +x â i = 1,2,…,I; k = 1,2,…,K, (2)

where iky is the performance indicator for the i-th objective for the k-th organisation,

i0! is a coefficient,

ik1x is a 1 × qi vector of qi regressors specific to the objective i,

i1â is a qi × 1 vector of coefficients, and

iku is an error with ( ) 0

ikE u = . By stacking

the k organisations above each other, the multivariate model for the set of I objectives

may be written as:

0 1 1i i i i iy X u!= + +â i = 1,2,…,I, (3)

or:

011 11 11 1

022 12 12 2

1 10

0 ... 0

0 ... 0

... ... ... ......

0 0 ...I I I II

y X u

y X u

y X u

! !

! !

!!

" #" # " # " # " #$ %$ % $ % $ % $ %$ %$ % $ % $ % $ %= + +$ %$ % $ % $ % $ %$ %$ % $ % $ % $ %

& ' & ' & ' & '& '

, (4)

where iy ,

i0â and

iu are k × 1 vectors,

i1X is a k × qi matrix, and

i1â is a qi × 1

vector.

Page 11: Performance Assessment in the Context of Multiple

11

If the performance of an organisation k on two objectives i and p is related by

unobservable factors (e.g. managerial decisions), then iku would be correlated with

pku for i ≠ p. By estimating a multivariate model, we allow for such correlation. This

implies that:

( )'ik ph ipE u u != , if k = h and 0 otherwise, (5)

where k and h denote two different organisations.

In addition to explicit consideration of correlations between objectives, a major

advantage of the multivariate method for performance assessment in the context of

multiple objectives is that it does not require objectives to be weighted because

information on relative performance is provided specifically for each objective.

In the remainder of the paper we apply this technique to the analysis of health

authorities operating in the English National Health Service.

Data

Thirteen indicators of health authority performance are available (table 1). Variously,

these address health system objectives relating to NHS performance domains ‘health

outcomes’, ‘clinical quality’, ‘access’ and ‘efficiency’. All have been used in official

assessments of NHS performance but the set of indicators analysed here is not

intended as an exhaustive list of health system objectives (Hauck et al., 2003). Most

of the indicators are expressed as ratios, in which the actual value is standardised

against national values to take account of the composition of the local population in

Page 12: Performance Assessment in the Context of Multiple

12

terms of age, gender and case mix. For all indicators a higher value implies worse

performance.

Table 1 around here

Even after standardisation, differences in performance may arise from additional

variations in population characteristics. To account for this possibility, we condition

the performance indicators on various socioeconomic factors which have been shown

to be associated with the needs of the population and the utilisation of health care

(Carr-Hill et al., 1994). For the majority of the performance indicators, we condition

on factors used by the English Department of Health to estimate population

requirements for acute care resources. These variables are the proportion of those of

pensionable age living alone (OLDAL), the proportion of dependents in single carer

households (SCARE), and the proportion of the economically active population that is

unemployed (UNEMP).

Socioeconomic characteristics used by the Department of Health to estimate

population requirements for non-acute services are taken into account when

considering performance with respect to controlling the costs of psychiatry services

(Carr-Hill et al., 1994). These are the proportion of those of pensionable age living

alone (OLDAL), the proportion of persons in lone parent households (LOPAR), the

proportion of dependents in households with no carer (NCARE), and the proportion

of residents born in the New Commonwealth (NCMW). All explanatory variables are

centred around zero by subtracting the grand mean from the value for each ward.

Page 13: Performance Assessment in the Context of Multiple

13

The dataset is merged from several sources, including the 1991 census of population,

NHS administrative data, the 1990/91 Hospital Episode Statistics, a database of all

hospital inpatient episodes, and the 1990/91 Health Service Indicators package, which

includes summary indicators of health authority performance. The data are available

for each of the 4985 English electoral wards (although there are some missing data).

Wards have an average population of some 9,600 and are clustered geographically

within 186 health authorities covering populations of around 250,000. Our interest is

in assessing the contribution to observed performance made by these health

authorities. Descriptive data are provided in table 2.

Table 2 around here

Methods

Aggregate OLS models

For comparison purposes, we first estimate a conventional multiple regression model

for each performance indicator in which the population-weighted ward-level data are

aggregated to the relevant health authority. The health authority is thereby considered

the unit of analysis and the model takes the following form:

kkk uy ++=

110âx! , k = 1,2,…,K, (6)

where ky represents the population-weighted mean ward value of performance

indicator y across all wards in the k-th health authority, and k1

x represents a vector of

socio-economic variables in the k-th health authority. A positive slope parameter 1

!

Page 14: Performance Assessment in the Context of Multiple

14

can be interpreted as suggesting that worse socio-economic conditions are associated

with levels of performance worse than expected given the age and sex standardised

population characteristics of the health authority.

The term ku is the random error for the k-th health authority, assumed to have zero

mean and constant variance ( ). We can interpret ku as the parallel departure from

the mean regression line of the k-th health authority. Small estimated values of ku

indicate health authorities with close to average performance, after controlling for the

socio-economic situation of their wards. Large positive (negative) estimated values of

ku represent health authorities with an observed performance markedly worse (better)

than that predicted on the basis of demographic standardisation and socio-economic

conditions. If there is a large variation in ku , this suggests marked differences in

performance among health authorities. The model is estimated by ordinary least

squares using STATA 7.

This formulation ignores the hierarchical structure of the data by taking means

(weighted for population size) for ky and k1

x over all wards within a health authority.

There are a number of drawbacks to this (Rice and Leyland, 1996). First, we fail to

make full use of the available information. Second, there is a danger of committing

aggregation or ecological fallacy, in which a relationship found at the aggregate level

may not exist among the individuals (electoral wards in this case) from which the data

have been aggregated. For example, the average wait for hospital admission for

residents of a particular health authority may be no different from the national

average. However, this may disguise the possibility that waiting times depend on

Page 15: Performance Assessment in the Context of Multiple

15

where people live within this health authority, with people in one ward facing

substantially longer waits while those in another ward enjoying substantially better

access than average. Third, failure to account for clustering results in underestimates

of standard errors, which undermines significance tests. Fourth, the estimate of ku

captures random variation as well as managerial differences between health

authorities and is therefore an unsatisfactory measure of performance.

Multilevel (ML) models

Recognising the hierarchical structure of the data, we next analyse the data at electoral

ward level, these wards being merely geographical (rather than organisational)

constructs. Wards are clustered within health authorities and these wards are likely to

share closer similarities to one another than to wards elsewhere as they are similarly

affected by the decision making of the health authority to which they belong. Also,

they may share unobserved socio-demographic influences over and above those

included in the model. Importantly for estimation, this clustering implies that wards

cannot be considered independent observations. Rather their common (perhaps

unobservable) similarities imply correlation among wards within each health

authority, and this correlation invalidates classical OLS estimation because the iid

assumption is not met. To account for inter-dependence among wards, we define a

two-level random intercept model as follows:

jkkjkjk euy00110

+++= âx! , j = 1,2,…,J; k = 1,2,…,K, (7) where yjk represents performance indicator y in the j-th ward within the k-th district,

and x1jk represents the socio-economic conditions in the j-th ward in the k-th health

Page 16: Performance Assessment in the Context of Multiple

16

authority. The size of wards varies considerably (range 2,041 – 33,073 people) so

estimation incorporates population weights. In this context, weights are applied to the

error components (Hauck et al., 2003). The weights applied at ward-level are

calculated as ( )2/12/1/

!!= jkjkj navenw and those at health authority level as

( )2/12/1/

!!=kkknavenw where njk is the population in ward j in health authority k, nk is

the population in health authority k, and ave(.) denotes the average across the

quantities contained in parentheses.

The terms u0k and e0jk are error components such that u0k relates to the k-th health

authority (interpreted as the k-th health authority’s level of relative performance) and

e0jk is the random error for the j-th ward within the k-th health authority. Both error

components are assumed to have zero mean and constant variance ( 2

ou

! , 2

oe

! ). We

place the same interpretation upon u0k as we did on ku under the earlier formulation,

but the two estimates are notably different. Most importantly the greater the extent of

clustering, the greater will be the underestimation of standard errors of fixed part

parameters by OLS models (Rice and Leyland, 1996). The ML formulation may also

lead to a change in the relative performance of health authorities. This will depend on

the extent to which performance varies among health authorities and on the extent to

which health authorities are able to influence the performance indicator in question.

The intra-class correlation coefficient, ICCML, is used to assess the proportion of total

variance attributable to health authority influence, and is calculated as:

10,)( 1222

000<<+=

!

MLeuuML ICCICCjkkk

""" . (8) Larger values of ICCML are indicative of greater potential for health authorities to

influence the value of the relevant performance indicator (Hauck et al., 2003).

Page 17: Performance Assessment in the Context of Multiple

17

The computer package Mlwin BETA version 2.0 is used for estimation (Rasbash et

al., 2000).

Multivariate multilevel (MVML) model

The multilevel model described above is calibrated separately for each performance

indicator. This ignores the possibility that levels of achievement might be correlated

across indicators. This correlation may be positive, if progress against one indicator

simultaneously advances another, perhaps because good management promotes all-

round performance. But the correlation may be negative if trade-offs are involved,

such as when scarce resources that might be employed to achieve one objective are re-

directed to pursue another. Analysis that recognises the possibility of simultaneity in

the pursuit of multiple objectives uses additional information on the correlations to

generate superior measures of organisational achievement than a piecemeal analysis.

The multilevel framework can be extended to consider multiple outcomes simply by

recognising that the performance indicators themselves are clustered, in this context

within wards (Gilthorpe and Cunningham, 2000, Yang et al., 2002). This is a

multivariate model in a multilevel context. By considering the performance indicators

as the lowest tier in the data hierarchy, the possibility of within-ward and within-

health authority correlation among indicators can be assessed. Thus the multivariate

multilevel model (MVML model) is conceptualised as a three-level multilevel model,

in which the set of I performance indicators (level 1) are clustered within J wards

Page 18: Performance Assessment in the Context of Multiple

18

(level 2), which are themselves clustered within K health authorities (level 3). The

MVML model can be written as:

ijkikiijkiijk euy00110

+++= âx! , i = 1,2,…,I; j = 1,2,…,J; k = 1,2,…,K.. (9)

Thus, yijk is the i-th performance indicator for the j-th ward clustered within the k-th

health authority. The other parameters are analogous to their counterparts in the

aggregate OLS and ML models, except that we now consider an additional level i.

The error terms ik

u0

and ijke0

are both assumed to be normally distributed with zero

mean and constant variance ( 2

,iu! , 2

,ie! ) for each indicator. The ward level error ijke

0

represents the random error for performance indicator i in the j-th ward. We allow for

the possibility that performance indicators may be correlated within the same wards,

with ipepjkijk ee ,00 ),cov( != but assume that a performance indicator is independent

across wards in different health authorities, hence 0)cov( 00 =ighijk ee . This latter

assumption is restrictive in that neighbouring wards in different health authorities may

share unobserved characteristics that influence performance, but we lack the

geographical information to allow for this possibility.

The health authority effect is captured by ik

u0

. The covariance for the i-th and p-th

performance indicators within a health authority k is given by:

ipupkik uu ,00 ),cov( != . (10)

These estimates of covariance can be used to calculate the degree of correlation rip

between performance indicator i and p:

Page 19: Performance Assessment in the Context of Multiple

19

2

,

2

,

,

puiu

ipu

ipr

!!

!

+= . (11)

If the correlation is positive, this implies that a health authority that has better than

average performance for indicator i also has above average performance for indicator

p. A negative correlation implies that above average performance for the one indicator

coincides with poorer performance for the other. This correlation is interpreted as

being due to unobservable influences on performance, such as the managerial

competency of the health authority or the shared influence of environmental

conditions over and above those factors that we have controlled for.

Consistent with the ML models, we estimate the intra-class correlation coefficient as

10,)( 1222

000<<+=

!

MVMLeuuMVML ICCICCijkikik

""" . (12) If there are correlations among performance indicators, the residuals from the ML

models, k

u0 , and the residuals from the MVML model,

iku0

, may differ for the same

performance indicator i.

We conduct a Likelihood Ratio Test to determine whether the correlations among

residuals are jointly zero or not. The ML models are the restricted models because

they impose 78 ( )2/)1313( 2!= zero correlation assumptions on the residuals. The

test statistic is given as:

IiLLFLLF

I

i

MLMVML,...,1,2

1

=!!"

#$$%

&!"

#$%

&'= (

=

) (13)

Page 20: Performance Assessment in the Context of Multiple

20

where LLFMVML is the log-likelihood function for the multivariate multilevel model,

and LLFML is the log-likelihood function for a multilevel model applied to a single

performance indicator. Asymptotically, ! has a chi-square distribution. A significant

test statistic indicates that estimation as a MVML model is preferable to separate

estimation of a set of ML models, and implies the presence of correlation among

performance indicators.

Accounting for outliers

In applying the OLS, ML and MVML specifications we test for the presence of highly

atypical health authorities using a procedure based on the interquartile range

suggested by Hamilton (Hamilton, 1992). The procedure defines ‘severe’ outliers as

IQRQu 3)25(. !< or , where Q(25) and Q(75) are the 25th and

75th quartile of the distribution of u., and IQR is the interquartile range. Severe

outliers comprise about 0.0002% of the normal population. The presence of a severe

outlier in samples of n<300 is considered sufficient evidence to reject normality at a

5% significance level (Hamilton, 1992). The outlier test is applied for each

performance indicator, with an observation defined as an outlier for the relevant

indicator if it appears as such under at least one of the three specifications. Outliers

are identified for only four of the performance indicators. For EMOLD, 3 health

authorities are defined as upper outliers (‘very bad’ performers); for DEATHS, 11

health authorities are lower outliers (‘very good’ performers, reporting zero deaths

after surgery); for WTRADIO, 5 health authorities are lower outliers (‘very good’

performers with very short waiting times); and for ELECTEPS, 6 health authorities

Page 21: Performance Assessment in the Context of Multiple

21

are lower outliers (‘very good’ performers with a high number of elective episodes).

Rather than dropping these observations, it is more appropriate to capture their

atypical influence by including a dummy variable (=1 if HA is outlier, =0 if not) that

is specific to each health authority and to each of the four performance indicators

affected (Langford and Lewis, 1998). Inclusion of the dummy variables conditions

residual variances on variations in performance which are non-normal, so that

variations in health authority effects for the affected authorities can be interpreted as

variations in performance. We adopted this procedure in estimating the OLS and ML

models, but the MVML model failed to iterate successfully with the addition of 24

dummy variables. To overcome this, a single dummy variable is applied for all the

outlying health authorities for each indicator in order to capture their average effect.

The ML model estimates are assumed to capture the individual effects for these

outliers.

Results

Model parameters

Estimation results are shown in table 3. As examples, we select one indicator from

each of the four performance domains – SMR064, EMOLD, WTSURG, and

DCRATE.2 The aim of the models is to control for influences upon performance that

are considered exogenous to health authority control rather than to explain variation in

each performance indicator. Hence, primary interest is in the value of the residuals

and not in the parameter estimates themselves. Nevertheless some features are worth

2 Results for the other indicators are available from the authors on request.

Page 22: Performance Assessment in the Context of Multiple

22

noting. First, parameter estimates are in close agreement for the ML and MVML

models, but differences are evident between these models and the aggregate OLS

models. These differences reflect errors in estimation that arise from a failure to

account for the hierarchical nature of the data, with the relationships estimated by the

aggregate model being contaminated by ward-level effects.

Table 3 around here

Second, for the majority of performance indicators, parameter estimates are

significantly positive, implying that areas with worse than average socio-economic

conditions are likely to have levels of achievement worse than that expected given the

age-sex standardised characteristics of the population. This suggests that age-sex

standardisation alone is insufficient to capture the differences in the characteristics of

ward populations.

Third, not only does the size but sometimes the direction of influence varies according

to the performance indicator. For example, as would be expected, health authorities

with a higher proportion of elderly people living alone (OLDAL) are likely to have a

higher proportion of elderly people admitted to hospital as emergencies (EMOLD).

But such health authorities are likely to have lower waiting times (WTSURG),

perhaps because the elderly are less likely to be placed on surgical waiting lists.

Correlations across performance indicators

Page 23: Performance Assessment in the Context of Multiple

23

The Likelihood Ratio test comparing the ML and MVML models clearly rejects the

null hypothesis of jointly zero correlations among the residuals (λdf=78 = 9,495,

p=0.000). This indicates that the MVML model improves inference by allowing

explicitly for correlations among the performance indicators. The correlation

coefficients for the health authority effects across the various indicators are presented

in table 4. Coefficients with an asterix are significant at the 5% level.

Table 4 around here

We find statistically significant positive correlations among the health outcome

indicators, SMR064, SMR6574 and SIR074. These correlations imply that in an area

with above-average mortality rates for ages 0-64, mortality rates for ages 65-74 and

rates of chronic illness will be above average also. There is a statistically significant

positive correlation (rip = 0.41) between the two clinical quality indicators, DEATHS

and EMOLD, implying that areas with a higher proportion of emergency admissions

also report more deaths following hospital surgery. There is an almost perfect

correlation between WTSURG and WTLONG (rip = 0.95), suggesting that one of

these waiting times indicators is redundant.

These two measures of waiting time have a significant negative correlation with the

health outcome measures (from rip = -0.25 to rip = -0.16), which might be indicative of

trade-offs between these broad types of objectives: efforts directed at reducing

waiting times may have adverse consequences for these measures of health outcome.

There is also a negative correlation between health outcomes and the number of

elective episodes (from rip = -032 to rip = -0.19), which means that, in health

Page 24: Performance Assessment in the Context of Multiple

24

authorities with higher rates of illness and mortality, more elective procedures are

undertaken. In contrast, there is a significant positive correlation between the health

measures and the indicator measuring accessibility to GPs (from rip = 0.47 to rip =

0.48). This suggests that in areas with above average illness and mortality rates people

experience greater difficulties in accessing GP services.

Variations in performance between health authorities

Taking account of the hierarchical nature of the data and of the possibility of

correlation among performance indicators may have an impact on the extent to which

variations in performance can be attributed to health authorities. Table 5 provides

details of the variance components for the three models. For the ML and MVML

models the intra-class correlation coefficient ICC provides an indication of the extent

to which variations in performance, after conditioning upon socio-economic factors,

are attributable to health authorities.

This varies according to the performance indicator. It appears that health authorities

have a substantial role in determining performance as measured by waiting times and

day case rates. Around 70% of small area variation in performance against these

indicators is being attributed to health authorities, after controlling for variation due to

differences in socioeconomic conditions. In contrast, the impact of health authorities

on mortality rates is considerably less, explaining only 15% of the variation in

mortality rates for the under 65s and 19% of that for those in the 65-74 age group.

These estimates of health authority effects are little changed by the move to the

MVML formulation.

Page 25: Performance Assessment in the Context of Multiple

25

Table 5 around here

It is important to bear in mind that measurement error may result in performance

variations being over- or underestimated and wrongly attributed to the health

authority level. For example, systematic variations in data collection, actions of other

geographically defined agencies, or the influence of national public health policies

may lead to a mistaken attribution of the health authority effect as being due solely to

differences in health authority performance (Hauck et al., 2003). By a similar token,

variations across wards within each health authority may not be truly random. For

example, there may be systematic differences in the way that patients from different

wards within their shared health authority are treated by those who make decisions

that effect them, whether these be general practitioners, hospital staff, or local

managers. Such behaviour is most likely to be captured by the ward-level effect here,

though it would be more desirable to attribute it to the health authority. This is a

considerable challenge because such behaviour would be difficult to measure

accurately even were discrimination to be reasonably overt.

Sensitivity analysis of health authority effects

Taking account of the hierarchical nature of the data and of the possibility of

correlation among performance indicators may also have an impact on the estimates

for particular health authorities. The sensitivity of the relative performance of each

health authority to specification decisions can be illustrated graphically. Figures 3 to 6

plot the health authority effect estimated by the three models for SMR064, EMOLD,

Page 26: Performance Assessment in the Context of Multiple

26

WTSURG, and DCRATE. The health authority effect for the aggregate OLS model,

ku , is plotted on the diagonal from the ‘bottom left corner’ (best performance) to the

‘top right corner’ (worst performance) of each figure. The health authority effects

deriving from the ML (k

u0

) and MVML (ik

u0

) models are indicated respectively by a

circle and a triangle. The vertical lines connecting these points depict the range in

values for each individual health authority, with longer lines indicating greater

sensitivity in individual values to the choice of model specification.

The sensitivity of health authority effects to which estimation method is employed is

greater for some performance indicators and health authorities than for others. For

example, estimates for SMR064 (figure 3) seem comparably stable irrespective of the

estimation technique. Estimates for DCRATE (figure 6) do not differ much for health

authorities with average performance, but they do differ for the very good and the

very bad performers. WTSURG (figure 5) shows relatively high volatility, with this

being apparent across the entire series. For EMOLD (figure 4) the choice of model

specification has substantial implications for a handful of health authorities, which are

not concentrated at any particular location along the series.

The figures illustrate that most of the variation in relative performance stems from a

failure to consider the hierarchical nature of these data. The ML and MVML estimates

are mostly in close agreement, suggesting that correlations across performance

indicators do not have a great impact on the estimates of health authority performance

for our dataset. This is despite the fact that the Likelihood Ratio Test indicates that the

MVML model improves inference, and that we find strong and significant

correlations between some performance indicators.

Page 27: Performance Assessment in the Context of Multiple

27

Conclusion

The analysis presented above is used to illustrate an approach to performance

assessment in the presence of multiple objectives. This is deemed preferable to

standard analytical approaches in which objectives either are amalgamated in some

way to create a single index of overall achievement or are considered in isolation. A

single index requires that objectives are weighted in some way so that they can be

aggregated. The relative value to be placed on the objectives of public sector

organisations is a political issue requiring explicit consideration and we contend that

this matter should not be subsumed as a technical part of the analytical process. By

analysing objectives individually, weighting is unnecessary. However, separate

analysis ignores the possibility that objectives may be related. This leads to a loss of

information and potentially biased measures of performance.

The multivariate approach described in this paper overcomes the main weaknesses of

these two standard approaches. It avoids the need to generate a single index and to

weigh objectives, but allows for the possibility that objectives are correlated with each

other. It provides policy makers and managers with information that is easier to

interpret and to act upon than on a single index, because estimates of performance are

specific to each objective. The multivariate approach also improves the quality of the

statistical analysis because, unlike more conventional approaches, it exploits

information on the correlation between objectives. This provides insight into the

potential trade-offs or synergies between indicators.

Page 28: Performance Assessment in the Context of Multiple

28

The multivariate approach has further advantages over the use of a single index. The

choice of which independent variables to include can be specific to each objective. In

the analysis conducted here this flexibility is little exploited in that, for most of the

equations, a standard set of regressors is applied. Even so, we are able to identify the

relative influence of these variables for each objective, rather than an average effect.

Aside from being more informative, this is important because the direction and size of

influence may differ according to the objective under consideration.

In addition to the multivariate nature of the analysis, the data considered in this paper

are hierarchical. Serious misrepresentations of relative performance may arise from

estimating aggregate OLS models in which objectives are considered in isolation

because of the failure to account for hierarchical data structures or correlations across

objectives. The extent of this misrepresentation varies according to the objective, but

two general observations can be made. First, it will be greater for those objectives

over which organisations have limited influence, simply because OLS estimates will

be more contaminated by random influences. Second, misrepresentation will be less

for those objectives that are highly correlated in a positive direction with other

measures of performance. In such cases, joint analysis merely reinforces the

assessment of separate models. The sensitivity of estimates of relative performance

will be more substantial in the presence of negative correlations among objectives

because achievement against one measure may be counter-acted by lesser

achievement against another. This is likely to be an important source of error when

analysing organisations with multiple conflicting objectives. We conclude, therefore,

that consideration should be given to analysing organisational performance on small

Page 29: Performance Assessment in the Context of Multiple

29

area or individual level data and by considering organisational objectives

simultaneously.

There are limitations with the analysis presented in this paper, the most obvious being

that we have been restricted by the cross-sectional nature of our data. Such snapshots

provide only partial insights about performance in contexts where there are likely to

be important dynamic effects. Current performance is likely to be partially attributable

to past efforts, and some degree of current effort may be directed toward future

attainment. The lags and lead times are likely to vary according to each objective. For

instance, efforts to reduce hospital waiting times may realise an effect more rapidly

than health promotion efforts designed to reduce mortality rates. The development of

a truly dynamic model of performance assessment that is able to recognise both past

inheritances and future investments is a major challenge for future research.

Page 30: Performance Assessment in the Context of Multiple

30

References

Carr-Hill, R.A., Hardman, G., Martin, S., Peacock, S., Sheldon, T.A., Smith, P.C., 1994, A formula for distributing NHS revenues based on small area use of hospital beds (Centre for Health Economics, University of York, York). Cooper, W.W., Seiford, L.M., Tone, K., 2000, Data envelopment analysis: a comprehensive text with models, applications, references and DEA-solver software (Kluwer Academic Publishers, Boston). Gilthorpe, M.S., Cunningham, S.J., 2000, The application of multilevel, multivariate modelling to orthodontic research data, Community Dental Health 17, 236-242. Hamilton, L.C., 1992, Resistant normality check and outlier determination, STATA Technical Bulletin Reprints 1, 86-90. Hauck, K., Rice, N., Smith, P.C., 2003, The influence of health care organisations on health system performance, Journal of Health Services Research and Policy 8, 68-74. Kaplan, R.S., Norton, D.P., 2001, Transforming the Balanced Scorecard from performance measurement to strategic management, Accounting Horizons 15, 87-104. Langford, I., Lewis, T., 1998, Outliers in multilevel models, Journal of the Royal Statistical Society: Series A (Statistics in Society) 161, 121-160. Nutley, S., Smith, P.C., 1998, League tables for performance improvement in health care, Journal of Health Services Research and Policy 3, 50-57. Rasbash, J., Browne, W., Goldstein, H., Yang, M., Plewis, I., Healy, M., Woodhouse, G., Draper, D., Langford, I., Lewis, T., 2000, A User's Guide to MLwiN, V2.1 (Institute of Education, London). Rice, N., Leyland, A., 1996, Multilevel models: applications to health data, Journal of Health Services Research and Policy 1, 154-164. Smith, P.C., Street, A., 2005, Measuring the efficiency of public services: the limits of analysis, Journal of the Royal Statistical Society: Series A (Statistics in Society) 168, 401-417. Yang, M., Goldstein, H., Browne, W., Woodhouse, G., 2002, Multivariate multilevel analyses of examination results, Journal of the Royal Statistical Society: Series A (Statistics in Society) 165, 137-146. Zellner, A., 1962, An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias, Journal of the American Statistical Association 57, 500-579.

Page 31: Performance Assessment in the Context of Multiple

31

Table 1: Performance indicators and socio-economic variables

PERFORMANCE INDICATORS DESCRIPTION

HEALTH OUTCOME

SMR064 Standardised mortality ratio for ages 0-64:Ratio of observed deaths from all causes in an area to the expected equivalent given the local age/sex profile and national averages

SMR674 Standardised mortality ratio for ages 65-74:Ratio of observed deaths from all causes in an area to the expected equivalent given the local age/sex profile and national averages

SIR074 Limiting long standing illness for ages 0-74:Ratio of observed number of people reporting limiting illness in an area to the expected equivalent given the local age/sex profile and national averages

CLINICAL QUALITY

EMOLD Emergency admissions of elderly people: Ratio of the rate of over 65 emergency admissions originating from an area to the expected given the age, sex and specialty of a patient and national averages

DEATHS Deaths following hospital surgery: Ratio of 30 day perioperative mortality after elective and non-elective surgery to the expected equivalent given the age, sex and case severity of a patient

ACCESS

WTSURG Waiting time for routine surgery: Ratio of actual waiting time in days for routine surgery to the expected equivalent given the age, sex and specialty of a patient and national averages

WTRADIO Waiting time for radiotherapy: Ratio of actual waiting time in days for radiotherapy to the expected equivalent given the age, sex and specialty of a patient and national averages

WTLONG Percentage of those on waiting list waiting for 12 months or more: Proportion of elective surgery admissions waiting for more than one year standardised for patient characteristics

GPACCS Accessibility to general practitioners (GPs): Indicator of relative accessibility given the supply of GPs, the distance to surgeries and the competition from local populations

ELECTEPS Number of elective surgery episodes: Ratio of standard surgery procedures originating from an area to the expected equivalent given the age, sex and specialty of a patient

EFFICIENCY

DCRATE Day case rate: Proportion of elective episodes in routine surgery treated as day cases standardised for patient characteristics

MATCOST Maternity costs: Ratio of specialty specific fixed and variable costs for episodes to the expected equivalent given national averages

PSYCOST Psychiatry costs: Ratio of specialty specific fixed and variable costs for episodes to the expected equivalent given the age and sex of a patient and national averages

SOCIO-ECONOMIC VARIABLES

OLDAL Proportion of pensionable age living alone

SCARE Proportion of dependents in single carer households

UNEMP Proportion of economically active unemployed

LOPAR Proportion of households headed by a lone parent

NCARE Proportion of dependents with no carer

NCMW Proportion of persons born in the New Commonwealth

Page 32: Performance Assessment in the Context of Multiple

32

Table 2: Descriptive statistics

Variable Obs Mean Std. Dev. Min Max

Population details

Ward population 4972 9644 3614 2041 33073 Health authority population 186 257805 113618 86777 891745

Number of wards within a health authority

4985 32.348 15 6 95

Performance Indicators

SMR064 4972 0.997 0.286 0.173 2.453 SMR674 4972 0.997 0.231 0.229 2.424 SIR074 4972 0.990 0.210 0.416 2.467 EMOLD 4967 1.02 0.362 0.055 9.849 DEATHS 4967 0.997 0.422 0.000 258.650 WTSURG 4461 1.011 0.206 0.443 1.949 WTRADIO 4967 0.933 0.105 0.001 1.000 WTLONG 4967 7.376 2.988 0.792 23.913 GPACCS 4972 0.528 0.128 0.162 0.969 ELECTEPS 4972 1.000 0.432 0.000 3.137 DCRATE 4461 0.383 0.083 0.138 0.618 MATCOST 4967 0.981 0.387 0.000 5.595 PSYCOST 4967 0.994 0.521 0.065 6.190

Socio-economic Variables

OLDAL 4985 0.000 0.170 -0.766 0.673 SCARE 4985 0.000 0.297 -1.337 0.923 UNEMP 4985 0.000 0.500 -1.217 1.704 LOPAR 4985 0.000 0.559 -1.945 1.639 NCARE 4985 0.000 0.360 -2.109 1.150 NCMW 4985 0.000 1.122 -3.287 3.472 PSICK 4985 0.000 0.506 -2.007 1.885

Page 33: Performance Assessment in the Context of Multiple

33

Table 3: Coefficient estimates (selected indicators only)

Coefficients with an asterix* are significant at the 5% level Standard errors in parentheses

OLS model ML model MVML model

SMR064 constant 0.998* (0.006) 0.995* (0.006) 0.996* (0.006) oldal 0.477* (0.092) 0.237* (0.019) 0.257* (0.018) scare 0.339* (0.064) 0.131* (0.010) 0.151* (0.016) unemp 0.162* (0.033) 0.333* (0.010) 0.306* (0.010) EMOLD constant 1.007* (0.013) 1.008* (0.013) 1.011* (0.013) oldal 0.555* (0.207) 0.161* (0.029) 0.208* (0.029) scare -0.315* (0.149) 0.046 (0.026) 0.122* (0.025) unemp 0.381* (0.079) 0.214* (0.016) 0.140* (0.015) e_outlier 0.815* (0.105) 0.886* (0.108) 0.677* (0.095) WTSURG constant 1.012* (0.012) 1.004* (0.014) 0.996* (0.013) oldal -0.858* (0.205) -0.035* (0.012) -0.041* (0.012) scare 0.137 (0.139) 0.017 (0.011) 0.015 (0.010) unemp -0.091 (0.071) 0.035* (0.007) 0.035* (0.006) DCRATE constant -0.390* (0.006) -0.383* (0.005) -0.385* (0.005) oldal 0.119 (0.094) 0.038* (0.005) 0.037* (0.005) scare -0.012 (0.064) 0.013* (0.004) 0.013* (0.004) unemp -0.022 (0.033) -0.007* (0.003) -0.006* (0.003) Table 4: Correlation of health authority effects from the multivariate multilevel model Correlation coefficients with an asterix* are significant at the 5% level

SMR064 SMR674 SIR074 EMOLD DEATHS WTSURG WTRADIO WTLONG GPACCS ELECTEPS DCRATE MATCOST

SMR674 0.73*

SIR074 0.62* 0.91*

EMOLD 0.00 0.15* 0.05

DEATHS 0.17* 0.30* 0.26* 0.41*

WTSURG -0.16* -0.22* -0.20* -0.13* -0.03

WTRADIO 0.00 0.00 0.16 0.00 0.00 -0.10

WTLONG -0.12 -0.25* -0.21* -0.13 -0.07 0.95* -0.13

GPACCS 0.26 0.47* 0.48* 0.21* 0.21* 0.10 0.00 0.05

ELECTEPS -0.15 -0.32* -0.25* -0.13 -0.13 0.32* -0.07 0.34* -0.07

DCRATE 0.00 -0.18 -0.12 0.00 0.00 0.40* -0.26 0.38* 0.00 0.35*

MATCOST 0.10 0.00 0.10 0.02 -0.04 0.14 0.26* 0.12 0.07 -0.16* -0.15

PSYCOST 0.16 0.15 0.27* 0.09 -0.20* 0.13 0.14* 0.11 0.28* -0.09 -0.05 0.21*

Page 34: Performance Assessment in the Context of Multiple

34

Table 5: Variances and intra-class correlation coefficients

OLS model ML model MVML model

2

u!

2

ou

! 22

00eu

!! + ML

ICC 2

0 iu

! 22

00 iieu

!! + MVML

ICC

SMR064 0.006 0.005 0.034 0.15 0.005 0.034 0.15 SMR674 0.008 0.006 0.031 0.19 0.006 0.031 0.19 SIR074 0.013 0.013 0.024 0.54 0.013 0.024 0.54 EMOLD 0.029 0.027 0.097 0.28 0.029 0.099 0.29 DEATHS 0.028 0.028 0.104 0.27 0.029 0.105 0.28 WTSURG 0.025 0.031 0.041 0.76 0.032 0.043 0.74 WTRADIO 0.003 0.005 0.009 0.56 0.003 0.007 0.43 WTLONG 5.248 6.097 8.778 0.69 6.112 8.793 0.70 GPACCS 0.003 0.003 0.010 0.30 0.003 0.010 0.30 ELECTEPS 0.082 0.078 0.187 0.42 0.078 0.187 0.42 DCRATE 0.005 0.005 0.007 0.71 0.005 0.007 0.71 MATCOST 0.087 0.077 0.137 0.56 0.076 0.136 0.56 PSYCOST 0.061 0.071 0.207 0.34 0.070 0.206 0.34

Page 35: Performance Assessment in the Context of Multiple

35

Figure 1: The production possibility frontier: different preferences lead to different weights

Figure 2: Performance comparisons across multiple dimensions

y1

y2

IC2

IC1

FF

y1

y2

FF

A C

D 0

Page 36: Performance Assessment in the Context of Multiple

36

Figure 3: Sensitivity analysis of health authority effects for ‘Standardized mortality

ratio (ages 0-64)’

Figure 4: Sensitivity analysis of health authority effects for ‘Emergency admissions of

elderly people’

Page 37: Performance Assessment in the Context of Multiple

37

Figure 5: Sensitivity analysis of health authority effects for ‘Waiting time for routine

surgery’

Figure 6: Sensitivity analysis of health authority effects for ‘Day case rate’