12
HEALTH ECONOMICS Health Econ. 12: 355–366 (2003) Published online 3 January 2003 in Wiley InterScience (www.interscience.wiley.com). DOI:10.1002/hec.761 INTERNATIONAL STATISTICAL COMPARISONS A critique of the World Health Organisation’s evaluation of health system performance Jeff Richardson a, *, John Wildman b and Iain K. Robertson c a Health Economics Unit, Centre for Health Program Evaluation, Monash University, Victoria, Australia b Department of Economics, University of Newcastle Upon Tyne, UK c Centre for Health Program Evaluation, Monash University, Australia Summary The World Health Organisation’s (WHO) approach to the measurement of health system efficiency is briefly described. Four arguments are then presented. First, equity of finance should not be a criterion for the evaluation of a health system and, more generally, the same objectives and importance weights should not be imposed upon all countries. Secondly, the numerical value of the importance weights do not reflect their true importance in the country rankings. Thirdly, the model for combining the different objectives into a single index of system performance is problematical and alternative models are shown to alter system rankings. The WHO statistical analysis is replicated and used to support the fourth argument which is that, contrary to the author’s assertion, their methods cannot separate true inefficiency from random error. The procedure is also subject to omitted variable bias. Country rankings based upon the model are correspondingly unreliable. It is concluded that, despite these problems, the study is a landmark in the evolution of system evaluation, but one which requires significant revision. Copyright # 2002 John Wiley & Sons, Ltd. Keywords World Health Organisation; WHO; health system; efficiency; performance Introduction Background In the World Health Report 2000 [1], the World Health Organisation (WHO) presents, in its own words, ‘new concepts and measures which lay the empirical basis for assessing health system performance, (p. 144). The conceptual steps in the construction of the WHO performance mea- sures are shown in Figure 1. Five objectives are nominated as evaluative criteria. These are mea- sured and become the basis for two parallel streams of analyses. In the first, a summary measure of a nation’s health is examined, namely, the disability adjusted life expectancy or DALE. Second, an index of systemic achievement (the ‘overall goal attainment index’) is constructed from the five objectives (including health). In both streams the key variable (DALE or the performance index) is explained econometrically by the level of a country’s health expenditures and education. These are assumed to fully represent the inputs into the ‘production’ of national health and system achievement. ‘Poor’ performance is defined by inefficiency which is the difference between a country’s actual achievement and the achievement that should be possible with the country’s health expenditure and education. This, in turn, is determined empirically by the performance of the most efficient country. Copyright # 2002 John Wiley & Sons, Ltd. Received 22 October 2001 Accepted 23 July 2002 *Correspondence to: Health Economic Unit, Centre for Health Program Evaluation, Monash University, PO Box 477, West Heidelberg, VIC 3081, Australia. E-mail: [email protected]

A critique of the World Health Organisation's evaluation of health system performance

Embed Size (px)

Citation preview

Page 1: A critique of the World Health Organisation's evaluation of health system performance

HEALTH ECONOMICS

Health Econ. 12: 355–366 (2003)

Published online 3 January 2003 in Wiley InterScience (www.interscience.wiley.com). DOI:10.1002/hec.761

INTERNATIONAL STATISTICAL COMPARISONS

A critique of theWorld Health Organisation’s evaluation ofhealth system performance

Jeff Richardsona,*, John Wildmanb and Iain K. RobertsoncaHealth Economics Unit, Centre for Health Program Evaluation, Monash University, Victoria, AustraliabDepartment of Economics, University of Newcastle Upon Tyne, UKcCentre for Health Program Evaluation, Monash University, Australia

Summary

The World Health Organisation’s (WHO) approach to the measurement of health system efficiency is brieflydescribed. Four arguments are then presented. First, equity of finance should not be a criterion for the evaluation ofa health system and, more generally, the same objectives and importance weights should not be imposed upon allcountries. Secondly, the numerical value of the importance weights do not reflect their true importance in thecountry rankings. Thirdly, the model for combining the different objectives into a single index of systemperformance is problematical and alternative models are shown to alter system rankings. The WHO statisticalanalysis is replicated and used to support the fourth argument which is that, contrary to the author’s assertion, theirmethods cannot separate true inefficiency from random error. The procedure is also subject to omitted variable bias.Country rankings based upon the model are correspondingly unreliable. It is concluded that, despite these problems,the study is a landmark in the evolution of system evaluation, but one which requires significant revision. Copyright# 2002 John Wiley & Sons, Ltd.

Keywords World Health Organisation; WHO; health system; efficiency; performance

Introduction

Background

In the World Health Report 2000 [1], the WorldHealth Organisation (WHO) presents, in itsown words, ‘new concepts and measures which laythe empirical basis for assessing health systemperformance’, (p. 144). The conceptual steps inthe construction of the WHO performance mea-sures are shown in Figure 1. Five objectives arenominated as evaluative criteria. These are mea-sured and become the basis for two parallelstreams of analyses. In the first, a summarymeasure of a nation’s health is examined, namely,

the disability adjusted life expectancy or DALE.Second, an index of systemic achievement(the ‘overall goal attainment index’) is constructedfrom the five objectives (including health). Inboth streams the key variable (DALE or theperformance index) is explained econometricallyby the level of a country’s health expendituresand education. These are assumed to fullyrepresent the inputs into the ‘production’ ofnational health and system achievement. ‘Poor’performance is defined by inefficiency which is thedifference between a country’s actual achievementand the achievement that should be possible withthe country’s health expenditure and education.This, in turn, is determined empirically bythe performance of the most efficient country.

Copyright # 2002 John Wiley & Sons, Ltd.Received 22 October 2001

Accepted 23 July 2002

*Correspondence to: Health Economic Unit, Centre for Health Program Evaluation, Monash University, PO Box 477, WestHeidelberg, VIC 3081, Australia. E-mail: [email protected]

Page 2: A critique of the World Health Organisation's evaluation of health system performance

Details of the technical analysis are discussedfurther.

Goals and their measurement

The five objectives are said to be:

* maximising population health;* reducing inequalities in population health;* maximising health system responsiveness;* reducing inequalities in responsiveness; and* financing health care equitably.

The health of the population is measured by theDALE from birth. This statistic uses the publishedannual population and mortality experience ineach country for each age cohort, and convertsthem into DALEs for each country using disabilitynorms for different diseases. For any particularcountry, the amount of disability due to a

particular disease is not measured, but assumedto equal the average disability rates in the samplecountries used in the production of the norms.

The distribution of DALEs is not used in theanalysis. Instead, small area variation in childhoodmortality is used as a proxy for distribution ofpopulation health.

The responsiveness and distribution of systemresponsiveness to clients were calculated from theresults of a convenience survey which elicited bestguesses by 1791 respondents about the degree ofrespect for persons (dignity, autonomy, confiden-tiality) and client orientation (prompt attention,quality of basic amenities, access to social supportnetworks, choice of provider).

Fair financing is defined as the sum of all costspaid directly or indirectly by households as apercentage of their disposable income. It isdiscussed further below.

The overall efficiency of the system is measuredby a performance index which is a simple weighted

• H• R•

Health (H)=

DALES

Predicted Health

Performance(Efficiency)

Five Objectives• Health (1)• Responsiveness (1)

=DA

Attainment Indexcombination of 5 objectives

(incl health)

←Health Expenditure

Education

←Prediction, actual

Ranking

Measurement

Equity (3)

Ranking

Performance(Efficiency)

←Prediction, actual

Predicted Index ←

Health ExpenditureEducation

Figure 1. Steps in the construction of the WHO country scores and ranking

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

J. Richardson et al.356

Page 3: A critique of the World Health Organisation's evaluation of health system performance

average of the scores obtained on the attainmentscales for the five objectives. Weights obtainedfrom survey respondents are assigned to each ofthese as follows:

Health (DALEs) 0.25Health distribution 0.25Responsiveness 0.125Distribution of responsiveness 0.125Fairness of system financing 0.25

Results

A selection of the results are reported in Table 1.Prima facie, some of these are surprising. Forexample, in terms of DALEs and overall goalattainment Australia ranks 2nd and 12th. How-ever, the methodology reduces its ranking withrespect to these objectives to 39th and 32nd,respectively. However, Australia’s health expendi-tures per capita in 1997 were almost exactly equalto the OECD average (excluding Turkey, Portu-

gal, Spain and Mexico). Years of schooling forthose above 15 was only 11.1% greater than theaverage. There is no other obvious aspect ofAustralia’s health system, social or individualbehaviour to explain such a downward revision.This suggests that the improbable results may bean artefact arising from the imperfect measure-ment of variables or from unreliable modelling oreconometric methodology.

In the March 2001 edition of ‘Health Economics’Alan Williams [2] lambastes the report primarilyfor the former reason.

(The index) is based on very little actual data, whichis often heavily manipulated to make it usable, andthen subjected to a great deal of rather adventurousmodelling. . . (p. 97).

Apart from the DALE calculations, which are allpartly factual and partly speculative, the onlyindicator for the USA which is not imputed is childmortality, and for Denmark they are all imputed. Soit was not Real Denmark that was rated below RealUSA, but Fictional Denmark that was rated belowFictional USA (p. 99).

Table 1. Health system attainment and performance in selected countries ranked by four measures (estimates for1997)

Attainment of goals Performance

Country Health(DALES)

Overall goalattainment

On levelof health

Overall healthsystem performance

Australia 2 12 39 32Austria 17 10 15 9Belgium 16 3 28 21Canada 12 7 35 30Denmark 28 20 65 34Finland 20 22 44 31France 3 6 4 1Germany 22 14 41 25Japan 1 1 9 10Luxembourg 18 5 31 16Malta 21 31 2 5Netherlands 13 8 19 17New Zealand 31 26 80 41Oman 72 59 1 8Portugal 29 32 13 12Russian Fed 91 100 127 130Spain 5 19 6 7Sweden 4 4 21 23Switzerland 8 2 26 20Turkey 73 96 33 70UK 14 9 24 18USA 24 15 72 37

Source: WHO, World Health Report 2000.

Health System Performance 357

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 4: A critique of the World Health Organisation's evaluation of health system performance

Williams does concede that ‘At a technical levelthere is much ingenious analytical work behindthis report. . .’ p. 98.

In the present article we amplify some ofWilliams’ comments on objectives and importanceweights but focus primarily upon some technicalissues of system modelling. More specifically wecomment on four aspects of the WHO study.These are:

(i) the objectives of the health system;(ii) the importance weights;(iii) the model used to construct the composite

performance index; and(iv) the econometric modelling of predicted health

and system performance.

Objectives of the health system

One of the great contributions of the WHO studyis that it makes explicit the existence of multipleobjectives rather than the two – health outcomeand ‘equity’ – acknowledged in most economicanalyses. It further challenges orthodoxy bysuggesting that, quantitatively, fairness may be ofgreater importance than health outcome per se.The selection and treatment of objectives, how-ever, are contestable.

First, the WHO methodology imposes the sameobjectives upon all countries. This is legitimate if thepurpose of the WHO is to evaluate health systemsaccording to the values and ethical judgements of theWHO. However it would be unfair and misleadingto describe a county’s health systems as inefficientbecause it did not perform well when judged byobjectives which were not their own. This issue is theessential element in the debate between the propo-nents of the WHO model [3] and Blendon et al. [4]who show that for 17 industrialised countries theWHO ranking of performance does not correspondwith a ranking based upon consumer satisfaction – adifferent but defensible systemic objective. The self-evident fact that the rank order of a country’sperformance may change when objectives change isillustrated later in Table 4.

Secondly, it is assumed that the same loss ofDALES occurs for the same disease in everycountry. As highlighted by Reidpath et al. [5] thisimplies the same loss of well-being for the quad-riplegic in Cameroon who must drag herself throughthe mud to reach a latrine as for the wealthy

Australian quadriplegic in her electric wheelchairand in an environment modified to assist thedisabled. Thirdly, we discuss the objectives of thehealth systems; in the importance weights and modelused to combine attainment scores are problema-tical. Fourthly, we argue below that fair financing is,at best, an ambiguous concept and should not beincluded as an objective of the health system.

Fair financing

There is considerable appeal in the argument thatthe financing of a health system should be ‘fair’ [6].Despite this the measurement of fair financing isproblematical. For example, Wagstaff [7] hasshown that the WHO index of fair financing doesnot discriminate between systems that are regres-sive and progressive or between horizontal in-equity and progressivity/regressivity. There are,however, three more fundamental problems withthe interpretation of an index of fair financing andthese cast doubt upon the interpretation of this (orany other) distributive index.

First, the criterion of fair financing of the healthsector conflates two social objectives, namely, theachievement of health and health-related fairness.The second concerns the distribution of income.Health per semay be affected by the distribution ofincome, and access to health services will beaffected by the existence of financial barriers.However access is a different issue than fairfinancing. Private out-of-pocket expenditures areborne exclusively by sick people and this does raisea legitimate ethical issue: should the sick have topay more; should there be a ‘tax on the sick?’ Butthe majority of funds in most countries are derivedfrom tax revenues and the composition of govern-ment taxation is unrelated to any objective relatingto health per se or health-related fairness.

Second, and following from this, measurementof the incidence of health revenues is conceptuallyquestionable and potentially misleading. Theincidence of out-of-pocket expenditures is unam-biguous. In contrast, tax revenues cannot becausally linked to particular taxes. Attributionrules are arbitrary (as with the attribution of fixedcosts). For example, a common and appealing ruleis to apportion health expenditures to taxes inproportion to total tax revenues from each source.This, however, implies that the evaluation offairness of government tax-based health revenuesis equivalent to the evaluation of national taxation

J. Richardson et al.358

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 5: A critique of the World Health Organisation's evaluation of health system performance

and this should not be considered as an objectiveof the health system per se.

If the incidence of health revenues could beunambiguously determined the interpretation ofthe result could remain contentious. This is easilyillustrated in Table 2. In this, a nominal healthinsurance premium is levied in two countries in away which implies greater equity in country A. Inthis country the poor pay nothing. In Country Bthe percentage contribution by the poor is doublethe contribution of the rich. Prima facie, thisappears to be inequitable. However in this secondcountry the rich are independently taxed at a farhigher rate than in Country A. The overall impactof these taxes leaves the rich in Country A with60% of their gross earnings which is significantlymore than the 45% retained by the rich in CountryB. The poor in Country A are left with 70% ofgross income, less than the 90% in Country B: thecountry with the superficially inequitable contri-bution to health insurance. The general point isthat equity cannot sensibly be segmented. Thehigher insurance premium for the poor in CountryB may be an explicit recognition of their low levelof general taxation.

Finally, and most fundamentally, the index offair financing derived here and elsewhere does notprovide an unambiguous answer to an interestingquestion. In the present context at least threequestions might be asked. First, ‘what was theeffect upon the distribution of income when anational health system was created?’ (The answermight require the cumulation of incrementalchanges that occurred through time.) The ques-tion, however, is only of historical interest, asother elements of taxation might have beenadjusted because of the initial effects of the healthfinancing legislation.

The second question is ‘what is the effect uponthe present distribution of income of presenthealth financing arrangements?’ As noted above,the answer to this question requires the use ofattribution rules to link health revenues withparticular taxes. However there is generally nocausal relationship between particular taxes andhealth revenues. As the growth of health expendi-tures rises and falls there is no correspondingincrease and decrease in the taxes to which healthrevenues have been attributed. As also noted, thecommon pro rata rule reflects, not health-relatedfairness but the fairness of the entire tax system.

Thirdly, it may be asked whether a country’shealth financing is more or less regressive thanhealth financing in other countries. While theanswer to this question again requires the use of anattribution rule, it may be argued that the use ofthe same rule will lead to a valid comparison. Theargument is not, however, correct. The sameattribution rule applied to differing taxationsystems will produce results reflecting the overalltaxation system. In general the consistent use of aninvalid method does not lead to valid conclusions.

The attempt to determine the distributionaleffect of a health system encounters a conceptualproblem that has been neglected in the literature.This is that the impact of one scheme may only bedetermined by comparison with another scheme.The first rule of economic evaluation is that theremust be a comparator. The fair financing literaturehas not made this explicit and has commonly andimplicitly adopted an unrealistic comparator. Theassumption that tax revenues can be attributed prorata to total taxation is equivalent to assumingthat if a national health scheme was to bedismantled then the revenue saving to governmentwould be returned to taxpayers in direct propor-tion to the size of each tax. Further it must beassumed that nothing further would happen;people would not find alternative methods forfinancing health services. If they were to do so thenthe effect of dismantling the national healthscheme would depend upon the (unknown) newhealth scheme. In sum, economic evaluation,including the evaluation of tax incidence, requiresa choice between alternatives and, as sunk costsand sunk benefits are irrelevant, the appropriatechoice is between future options. Without aknowledge of the future options the distributionaleffect of the status quo cannot be determined.

The true incidence of the financing system canonly, therefore, be determined by a consideration

Table 2. Equity tax and health financing

Country

A B

Rich Poor Rich Poor

Outlay (%) (%) (%) (%)Tax 30 30 50 0Health Ins. Premium 10 0 5 10Out of pocket 0 0 0 0Pte consumption

Saving 60 70 45 90

Total 100 100 100 100

Health System Performance 359

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 6: A critique of the World Health Organisation's evaluation of health system performance

of the counterfactual option and this might bemore or less equitable than the existing system.For example, if a universal NHS, entirely financedfrom proportional taxation, was scrapped, atleast three outcomes could occur. The revenuesaving from the cessation of government healthexpenditures could be used to reduce the propor-tional taxes: income taxes on the wealthy could bereduced (as proposed in the USA by PresidentBush); or social service expenditures could rise.The retention and financing of the NHS wouldtherefore stave off a redistribution where the initialeffect could be neutral, regressive or progressive.

The conclusion from this discussion is that theinterpretation of any index of fair financing isfraught with conceptual difficulties. The incidenceof patient charges may be unambiguously deter-mined and contrasted between countries. Likewisethe incidence of policy initiatives is of great socialrelevance. With some trepidation it might besuggested that the counterfactual health schemein countries with highly progressive tax systemswould probably be financed by methods that weremore regressive. But conceptual ambiguities makesthe interpretation of precise indices of fair finan-cing highly contentious.

Importance weights

Prima facie, the weights attached to the WHOmodel are implausible. Health achieves an impor-tance weight of only 0.25 and the other four non-health objectives, have a combined importanceweight of 0.75. As Williams [7] points out, a

significant drop in health outcome might be offsetby a comparatively modest increase in equity.However, the nominal weights are a misleadingindicator of the importance of the weights in thecountry rankings. The effective weight dependsupon the nominal weight but also upon thevariation in the scores to which they will be applied.For example, if there was no difference in thedistribution of health across countries then, despitethe large importance weight of 0.25, distributionwould contribute nothing to the variation in theperformance indices and would be of no importancefor the ranking of health systems.

In Table 3 an adjustment is carried out to takeinto account the range of values for each of theobjectives. Column 1 reports the nominal impor-tance weights. Column 2 is the maximum lessminimum score for each of the objectives for allcountries in the study (Column 2a) and for the 50countries with the highest attainment level (Column2b). The maximum impact of each objective on theattainment level – Column (1)� (2) – is reported inColumn 3 and standardised effective weights, (i.e.weights which sum to 1.00) are reported in Column4 for the two groups of countries. Results indicatethat the effective weight for health is significantlygreater than the nominal weight. The weighting forequity in health does not change significantly.Responsiveness more than doubles its importanceweight for the top 50 countries but almost halves itsimportance for all countries. In the standardisedresponses equity in responsiveness ceases to be ofimportance for the top 50 countries and fairfinancing is of reduced importance and particularlyfor the top 50 countries.

Table 3. Importance weights: apparent and effectivea

Objective Weight Range (max to min) Wt.* range Effective weighta

All Top 50 All Top 50 All Top 50

Column 1 2a 2b 3a 3b 4a 4b

Health 0.25 0.912 0.173 0.22 0.04 0.33 0.37Equity In Health 0.25 0.754 0.107 0.19 0.026 0.29 0.24Responsiveness 0.125 0.414 0.237 0.052 0.03 0.08 0.28Equity in Respons. 0.125 0.586 0.018 0.073 0.002 0.11 0.02Fair Finance 0.25 0.524 0.034 0.131 0.01 0.20 0.09

Total 1.00 } } 0.666 0.108 1.00 1.00

aThe ‘effective weight’ is the weight which must be assigned to a 100 point scale with endpoints set equal to the maximum andminimum observed score so that the (new) weighted score adds exactly the same to the total score as the old weighted score andrange. Its advantage is that the contribution to the final index is immediately apparent from the importance weight withoutreference to the range of scores.

J. Richardson et al.360

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 7: A critique of the World Health Organisation's evaluation of health system performance

Modelling the attainment index

The WHO combines the achievement with respectto each objective into a single index for the healthsystem as a whole. This process of aggregationmay be done a number of ways and four optionsare given in Equations (1)–(4).

Four weighting formulae

Ij ¼Xi

wi Sij ð1Þ

Ij ¼ 1�Yi

ð1þ wisijÞ

( )� 100 ð2Þ

Ij ¼ 1�Yi

ð1� sijÞwi

( )� 100 ð3Þ

Ij ¼ Hj wh þXi=h

iwisij

!ð4Þ

Here Ij is the attainment index, country j, wi theimportance weight, for objective i, Hj the healthoutcome, country j, sij the numerical score,objective i, country j and sij is the Sij/100.

The first option is the additive model adopted bythe WHO. The overall index, I , is a weightedaverage of attainment scores where the weightssum to unity as shown in the second column ofTable 4. While it is the simplest and mostcommonly used, combination model Equation (1)has some undesirable properties. For example, asociety may consider its health scheme to be verypoor either if health outcome is very poor andequity very good or if the health outcome is verygood but equity very poor. There is no compellingreasons why the rate at which people’s assessmentof their health system should decline in the wayproscribed by the additive model. A furtherproperty of the additive model is that a reductionin the value of one score does not affect theimportance of other scores. With the WHOweights, a country might obtain an index score

Table 4. Four models of system performance

Sweden USA

Weight (wi) Score(si)

Index(Max 100)

Weight (w) Score(si)

Index(Max 100)

(1) (2) (3) (4) (1) (2) (3) (4)Health (H) 0.7 0.47 75 0.9 0.6 95Equity (E) 0.8 0.53 95 0.6 0.4 50Model 1a

Additivew Swedenb 86 > 71wi USAc 83 > 77

Model 2d

Multiplicativew Sweden 89 > 80w USA 86 5 90

Model 3a

Exponentialw Sweden 97 > 93w USA 95 5 96

Model 4a

Eclecticw Sweden 73 > 70w USA 74 5 76

aModels 1 and 4 use standardised weights which sum to unity.bSwedish weights.cUS weights.dModels 2 and 3 use unadjusted weights.

Health System Performance 361

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 8: A critique of the World Health Organisation's evaluation of health system performance

of 0.75 for responsiveness and for the equitabledistribution of useless health services. This is anundesirable implication of the aggregation model.

The simple additive model is a special case of thegeneral combination rule suggested in decisionanalysis [8]. This is shown in Equation (5).

I ¼ 1=kYi

ð1þ k wi siÞ � 1

( )� 100 ð5Þ

The additive formula occurs when the overallscaling constant, k, is equal to zero. In other casesEquation (5) reduces to a multiplicative model,and with k ¼ �1:0ðSi wi significantly exceeds 1.0)Equation (5) becomes Equation (2). An exponen-tial form of the multiplicative model is defined byEquation (3). The great advantage of this combi-nation rule is that each dimension may reduce theoverall attainment index to zero. In contrast in theadditive model with, for example, four dimensions,the importance weight of each dimension must, onaverage, equal 0.25.

The choice of these methods is not simply amatter of taste. Rather, it should be determined bythe structure of preferences and the desiredproperties of the index. Thus, for example, bothof these multiplicative models have the disadvan-tage that, like the additive model, a significantindex number could be achieved when health has ascore of zero. This problem is overcome in thefourth, ‘eclectic’ model. In this, the overall indexnumber is directly proportional to health and theimportance of other objectives depends upon theimportance of health.

The ranking of health systems depends uponboth the importance weights and the choice ofmodel. This is illustrated in Table 4 which gives theindex number derived from each of the fourmodels with hypothetical data for two hypothe-tical countries, Sweden and the USA. These have(arbitrarily selected) importance weights whichindicate that Sweden places a greater emphasisupon equity and the USA upon health per se. Thefirst column of weights represent unadjustedimportance scores and in the second column theseare standardised. For each of the models definedby Equations (1)–(4) both Swedish and US weightsare used.

The result of this illustrated here is that theranking of the two countries with respect to the‘weighted scores for health and equity’ dependsupon which country’s weights are selected andwhich combination rule is used. With Swedish

weights Sweden out-performs the USA with eachcombination rule. In contrast, with US weights therank order is reversed in three of the four models.

With the simple additive model, Sweden un-ambiguously out-performs the USA in Table 4.With models 2–4 the outcome depends upon whichweights are selected. With Swedish weights theindex number in models 2, 3 and 4 favours Swedenby 9, 4, and 3 percentage points. With US weights,its index numbers exceed the Swedish by 6, 1 and 2percentage points. This illustrates the fact that,with given attainment scores and weights apparent‘efficiency’ will also depend upon the choice of thecombination rule.

The econometric model

Performance scores were obtained by the WHOfor each country with respect to health per se(DALEs) and goal achievement (the overallperformance index). The econometric proceduresused to generate the two sets of rankings in AnnexTable 10 in the World Health Report werereplicated. Other papers have considered theproblems and extensions to the estimation proce-dure more thoroughly [9]. The WHO proceduredoes not overcome the problems associated withestimating efficiency performance.

The WHO estimation procedures and

replication of the WHO model

The efficiency estimates in the World HealthReport 2000 are obtained by combining fixedeffects panel data methods with a standardcorrected ordinary least-squares procedure. Effi-ciency is measured for both outcome variables,DALEs and the composite index. The models usedto estimate each output are reduced forms of atrans-log model (a flexible functional form) andare identical for each output.

The WHO criticise data envelopment analysis(DEA) and free disposable hull methods for beingunable to separate true inefficiency from randomvariation [10]. The authors also criticise correctedordinary least-squares (COLS) methods for nottightly bounding the data, for being problematic ifheteroskedasticity is evident and for suffering froman inability to distinguish between random errors

J. Richardson et al.362

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 9: A critique of the World Health Organisation's evaluation of health system performance

and true inefficiency. Despite this, the COLSmethod is used to estimate the minimum level ofhealth. Stochastic frontier models are rejected dueto the need to impose distributional assumptionson the efficiency term and because jointly deter-mined estimates of the error term are unreliable.

Evans et al. claim that panel data estimation ismore efficient than models using cross-sectionaldata in extracting information on inefficiencies.They also claim that panel data estimation ispreferable to stochastic frontier techniques be-cause there is no need to specify the distribution ofthe inefficiency terms. Fixed effects estimation isused because the models do not pass a Hausmantest, suggesting that the regressors are correlatedwith the error term, leading to biased estimatesusing a random effects model. Fixed effects modelsprovide consistent and unbiased estimates of thecoefficients, although they cannot estimate coeffi-cients which are time invariant since parametersare identified by variation across time (withinobservation identification). It has been suggestedthat the within observation variation is very smalland so the models are not identified [11].

There are two other estimation problems withthe WHO approach. First, the analysis has beencarried out using fixed effects panel data modelsbecause the models estimated did not pass aHausman test of specification. For estimatingmarginal effects the fixed effects model returnsunbiased (although inefficient) estimators as eitherN (number of observations) or T (time) go toinfinity. For the individual specific term the fixedeffects estimator is inconsistent, regardless of thesize of N [12]. Such inconsistency casts doubt overresults presented using the fixed effects approach.

The second problem is the treatment of effi-ciency as time invariant. Khumbrakar and Lovell[13] state that over long panels, or over any periodin which structural change could have occurred,assuming time invariance is problematical. Formany countries, such as those breaking away fromthe former USSR or those in periods of civil war,such issues will be vital. Efficiency needs to bemeasured using time-varying parametric paneldata techniques as suggested by Cornwell et al.[14] or non-parametric-based panel data Malm-quist indices as reviewed by Hollingsworth et al.[15]. Non-parametric techniques have anotheradvantage as they can estimate how the frontiertechnology is changing over time. The parametricmethods used by the WHO ensure that onecountry is always efficient, Malmquist indices

allow the potential production technology to vary;it may be that the efficiency frontier is movingfurther away from the least-efficient countries.

The WHO model

Evans et al. specify the following model for bothDALEs and the composite index:

Yit ¼ aþ Xitbþ vit � uit ð6Þ

which can be re-written as

Yit ¼ ai þ Xitbþ vit ð7Þ

The intercept ai is a country specific effect andcan be estimated using the fixed effects method.The frontier intercept is represented by ‘a’ (thehighest level of efficiency) and each country’s levelof inefficiency is calculated as

#aa ¼ maxð #aaiÞ and #uui ¼ #aa� #aai

where the ui’s are the country specific inefficiencyterms. Technical efficiency is defined as

TEi ¼E½Yitjuit;Xit�

E½Yitjui ¼ 0;Xit�ð8Þ

where Yit is the output variable (either DALE orthe attainment index), ui is inefficiency and the X’sare the independent variables.

Equation 8 is modified to account for minimumhealth status (Mit) – the level of health that wouldexist if each country had no health service.The resulting metric is referred to as overallefficiency (Ei):

Ei ¼EðYitjui;XitÞ �Mit

EðYitjui ¼ 0;XitÞ �Mit

ð9Þ

The composite minimum is similarly calculatedusing the total scores for fair financing andresponsiveness distribution plus the weightednormalised DALE.

COMPOSITEi min

¼ 37:5þ 25 �ðDALEi �DALEminÞðDALEmax �DALEminÞ

� �ð10Þ

The DALEmin term and the minimum healthterm for the DALE model, Mit, are estimatedusing a COLS procedure. Using data from 25countries the health measure DALE was regressedon literacy rates in 1908. The model (subtractingthe lowest residual from the COLS procedure) wasused to predict the minimum level of health for

Health System Performance 363

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 10: A critique of the World Health Organisation's evaluation of health system performance

each country for the years 1993–1997 if there wereno health service.

The measures of efficiency have a distributionwhich is computationally difficult to calculatebecause it is itself a function of other statistics.For this reason analytical methods cannot be usedto generate confidence intervals; however compu-tationally intensive simulation methods are avail-able. To generate 80% confidence intervals aroundthe efficiency measures Evans et al. [10] andTandon et al. [16] bootstrapped the efficiencymeasures. The bootstrapping method involveddrawing samples from the distribution of thedependent variable (either DALE or the compositeindex) and reestimating the efficiency models. Theuncertainty pertaining to the independent variableswas not considered in this method, which isunfortunate given the large number of imputedvalues.

Replication and extension

The procedures described in the World HealthReport 2000 were replicated using the WHO datato further examine the results. Table 5 below gives

the coefficients reported from Evans et al. [10],Tandon et al. [16] and our replication results.Columns 1 and 2 refer to the DALE models andcolumns 3 and 4 to the modelling of the compositeindex.

The coefficients in Table 5 show that our resultsare very similar to those reported by Evans et al.[10], and Tandon et al. [16]. The WHO coefficientsare not estimates but the averages of the boot-strapped coefficients from the uncertainty analysis.In our analysis we report only the estimatedcoefficients, since averaging bootstrapped coeffi-cients introduces bias [17]. Despite this the resultsfrom the two procedures are virtually identical.The Pearson correlation coefficients between thetwo sets of results are 1.00 for DALEs and 0.975for the composite index.

It is likely that OECD countries exhibit differentfeatures to non-OECD countries and to test thishypothesis we estimated separate models forOECD and non-OECD countries. Results arereported in Table 6.

For the OECD and the non-OECD countries R2

values are 0.14 and 0.55, respectively; that is, theexplanatory power in the OECD model is poor.The explanatory power for the non-OECD model

Table 5. Replication of WHO results

Variable WHODALE

DALEreplication

WHOCOMP

COMPreplication

Health Expenditure 0.009 0.009* 0.007 0.007*Schooling 0.063 0.063 0.050 0.049*Schooling squared 0.022 0.022 0.022 0.023*Constant 3.813 3.813* 4.112 4.110*Max (u) 0.213 0.209 0.173 0.170P value from reset test 0.174 0.359

Note: *Significant using a 95% confidence interval.

Table 6. Re-estimated models for OECD and non-OECD countries

DALES Composite Index

Variable OECD Other OECD Other

Health expenditure 0.016* 0.008* 0.014* 0.006*Schooling 0.334* 0.075 0.131 0.058*Schooling squared �0.026 0.015 0.014 0.018Constant 3.528* 3.767* 4.039* 4.075*Max(u) 0.086 0.259 0.063 0.209R2 0.14 0.55 0.38 0.58P value from reset test 0.007 0.413 0.010 0.848

J. Richardson et al.364

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 11: A critique of the World Health Organisation's evaluation of health system performance

is higher. The performance of the model using theOECD data suggests that countries should bestratified and analysed separately. To reinforcethis conclusion an application of the RESET testfor misspecification finds that the OECD model ismisspecified – the non-OECD and full samplemodels are not. This indicates that the model usedto estimate efficiency for the OECD should bedifferent to that used for non-OECD countries.

Greene [18] has suggested an alternative ap-proach to dealing with heterogeneity betweendifferent countries or classes of countries. Whileit is possible to divide the data to account fordifferences between countries, for example [18]suggests that in sub-Saharan Africa health caresystems focus on AIDS cases, while in NorthAmerica and Western Europe attention is tosmaller scale disease problems and quality of life(e.g. cancer care) issues. It is difficult to simplydivide groups of countries. Greene [18] proposes alatent class model. This method allows frontierestimation while controlling for some latentheterogeneity between units.

Discussion and conclusions

Despite our criticisms the WHO study represents alandmark for the evaluation of health systems. It isthe most sophisticated cross-national assessmentof system performance to date. Further, it hasdefended a set of importance weights whichdramatically states (or, as we argue, overstates)the importance of system objectives other thanhealth outcome.

The present analysis demonstrates that thishighly innovative study is seriously flawed inseveral respects. First, choice of objectives iscontestable and there is a particularly strong casefor omitting the equity of financing from the listand replacing it with an index of access, to healthservices. Future studies may (and should) employthe objectives which emerge from national house-hold surveys and qualitative analyses of systemobjectives. These are likely to identify additionaland country specific objectives, for example, theextent to which problems identified by health careprofessionals coincide with the problems perceivedby their clients.

Second, the WHO importance weights attachedto the system objectives have not been validated.Even if – contrary to expectation – there was

agreement upon objectives it is highly unlikely thatthere will be agreement over the choice of weightsas these reflect social or ethical rather thantechnical judgements. Countries with a stronglyegalitarian or communitarian tradition are likelyto place greater emphasis upon equity and lessemphasis on responsiveness. Countries whichplace greater relative importance upon individual-ism will place greater relative importance uponresponsiveness.

Variability of values does not imply the impos-sibility of a global analysis of system efficiency.The WHO exercise could be repeated with eachcountry having a unique combination of impor-tance weights, including zero where particularobjectives were not of social significance. Thesemay be constrained to achieve comparable resultsand applied to objectives that have been quantifiedin a standardised way. In principle, the task is nomore or less conceptually complex than the cross-national comparison of GDP from countries withdiffering relative prices or the assessment of(market) value using the revealed preferencecriterion when the set of criteria applied by eachconsumer is unique to the consumer.

As with the modelling of the quality of life, therehas been very little consideration of the validity ofthe underlying model which combines the scoresand weights of the different objectives into a singleindex of performance. It has been shown here thatthe ranking of nations may be sensitive to thechoice of the combination model. Different modelshave different properties and the appropriateproperty for the present exercise needs to be givencareful consideration.

Finally, results here suggest that the simulta-neous inclusion of every country in the analysiswill lead to invalid and unreliable results. TheWHO needs to consider stratifying the sample byobservable characteristics. Unobserved countrylevel characteristics will still, however, corruptthe efficiency measurement and it appears likelythat this problem may have driven many of thepresent WHO results. The fixed effects estimator isonly useful in long panels – the estimated efficiencyterm is only consistent as T goes to infinity. Theuse of long panels requires the use of time-varyingmethods of efficiency measurement. The WHOcould re-estimate the models using a variety ofparametric and non-parametric methods and usethe results for model and method validation.

Unlike Williams [2,7] we do not attempt toassess the overall costs and benefits that are likely

Health System Performance 365

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)

Page 12: A critique of the World Health Organisation's evaluation of health system performance

to follow from the WHO study and its reincarna-tions. Predictions of this type are hazardous and,as Victor Fuchs once noted, particularly when theyare about the future.

Acknowledgements

The authors would like to thank David Evans forproviding the data and the two anonymous referees forhelpful comments, in particular the referee who pro-vided detailed editorial correction.

References1. World Health Organisation. The World Health

Report: Health Systems. Improving Performance.World Health Organisation: Geneva, Switzerland,2000.

2. Williams A. Science or marketing at WHO? Acommentary on World Health Report 2000. HealthEcon 2001; 10: 93–100.

3. Murray C, Kawabata K, Valentine N. People’sexperience versus people’s expectations. Health Aff(Millbank) 2001; 20: 21–24.

4. Blendon R, Kim M, Benson J. The public versusthe World Health Organisation on Health systemperformance. Health Aff (Millwood) 2001; 20:10–20.

5. Reidpath DD, Allotey P, Kouame A, Cummins RA.Social cultural and environmental contexts and themeasurement of burden of disease: an exploratorystudy in the developed and developing world. KeyCentre for Women’s Health in Society, The Uni-versity of Melbourne: Melbourne, 2001.

6. Wagstaff A. Measuring Equity and Health CareFinancing: Reflections on an Alternative to the WorldHealth Organisation’s Fairness of Financing Index.The World Bank: Washington, DC, 2001.

7. Williams A. Science or marketing at WHO?Rejoinder from Alan Williams. Health Econ 2001;10: 283–286.

8. von Winterfeldt D, Edwards W. Decision Analysisand Behavioural Research. Cambridge UniversityPress: Cambridge, MA, 1986.

9. Hollingsworth B, Wildman J. The efficiency ofhealth production: re-estimating the WHO paneldata using parametric and non-parametric ap-proaches to provide additional information. HealthEcon (forthcoming).

10. Evans DB, Tandon A, Murray CJL, Lauer JA. Thecomparative efficiency of national health systems inproducing health: an analysis of 191 countries. GPEDiscussion Paper Series: No. 29, World HealthOrganisation: Geneva, Switzerland, 2000.

11. Gravelle H, Jacobs R, Jones AM, Street A.Comparing the efficiency of national health systems:econometric analysis should be handled with care.Mimeo University of York, 2002.

12. Verbeek M. A Guide to Modern Econometrics.Wiley: Chichester, England, 2000.

13. Khumbraker SC, Lovell CAK. Stochastic FrontierAnalysis. Cambridge University Press: Cambridge,MA, 1990.

14. Cornwell C, Schmidt P, Sickles RC. Productionfrontiers with cross sectional and time-series varia-tion in efficiency levels. J Econ 1990; 46: 185–200.

15. Hollingsworth B, Dawson P, Maniadakis N. Effi-ciency measurement of health care: a review of non-parametric methods and applications. Health CareManage Sci 1990; 2: 161–172.

16. Tandon A, Murray CJL, Lauer JA, Evans DB.Measuring overall health system performance for191 countries. GPE Discussion Paper Series: No. 30,World Health Organisation: Geneva, Switzerland,2000.

17. StataCorp. Stata Manuals Release 7. Stata Corpora-tion: College Station, TX, 2001.

18. Greene W. New developments in the estimation ofstochastic frontier models with panel data. Remarksprepared for the 7th European Workshop on Effi-ciency and Productivity Analysis, University ofOviedo: Spain, September 2001.

J. Richardson et al.366

Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 12: 355–366 (2003)