Leaving it to chance: the effects of random variation in shared savings arrangements

Leaving it to chance: the effects of random variationin shared savings arrangements

Derek DeLia

Received: 22 December 2012 / Revised: 1 August 2013 / Accepted: 30 October 2013 /Published online: 10 November 2013� Springer Science+Business Media New York 2013

Abstract Shared savings arrangements are designed to financially reward provider

groups that reduce healthcare spending through improved care coordination. A major

concern with these arrangements is that annual changes in spending are subject to a variety

of random factors that are unrelated to care coordination efforts. As a result, resources can

be misallocated if providers who are unsuccessful at controlling spending are inappro-

priately rewarded and providers who are successful are inappropriately denied rewards.

This paper provides a systematic analysis of the role of random variation using a general

statistical model based on shared savings arrangements that are currently evolving in the

public and private sectors. The model focuses specifically on the variance of the average

savings rate (ASR), which is the quantity used to determine whether and by how much a

provider group will be rewarded. Variance in the ASR is a major driver of the probabilities

of Type I error (i.e., inappropriately rewarding providers) and Type II error (i.e., inap-

propriately failing to reward providers), which can lead to major resource misallocations.

We find that the probabilities of Type I and Type II errors associated with common

approaches to savings measurement can be quite high, often exceeding 10 or 25 %,

respectively. We also find that the likelihood of both types of errors can be substantially

reduced through careful planning and design of savings measurement schemes before

payers and providers enter into shared savings agreements.

Keywords Shared savings � Healthcare spending � Type I and Type

II errors � Sample size � Accountable care � Patient centered medical homes

1 Introduction

Many are looking to shared savings arrangements as a way to slow the growth in healthcare

spending while simultaneously improving the quality of care. Under such arrangements,

D. DeLia (&)Rutgers Center for State Health Policy, 112 Paterson St., Room 540, New Brunswick, NJ 08901, USAe-mail: [email protected]

123

Health Serv Outcomes Res Method (2013) 13:219–240DOI 10.1007/s10742-013-0110-9

provider groups that improve the coordination and delivery of care in a way that reduces

per patient spending below a targeted amount would be given a share of these savings.

Shared savings are often tied to healthcare delivery reforms such as Patient Centered

Medical Homes (PCMHs) and Accountable Care Organizations (ACOs).

Shared savings models vary in the amount of financial risk that providers are expected

to bear. In one-sided models, provider groups are rewarded for savings but are not

penalized if spending grows faster than the targeted amount. In two-sided models, provider

groups agree to pay penalties for larger-than-allowed spending increases in exchange for

the opportunity to earn greater rewards from savings. Some view shared savings

arrangements as a temporary bridge between prevailing fee-for-service reimbursement and

more prospective payment mechanisms such as bundled payments and global capitation,

which place greater financial risk onto providers (Robinson 2011; Weissman et al. 2012).

Nevertheless, this transition may take many years to unfold across the entire U.S. health

system. As a result, shared savings mechanisms are likely to evolve and remain in the

health system for many years to come.

A number of shared savings arrangements have been introduced into Medicare as part of

the Patient Protection and Affordable Care Act (PPACA). Under the Medicare Shared

Savings Program (MSSP), Medicare will enter into shared savings arrangements with

newly formed ACOs (Medicare Program 2011). Shared savings are also integral to

Medicare’s Pioneer ACO Program in which providers that already have substantial

experience in care coordination enter into more sophisticated shared savings arrangements

(Centers for Medicare & Medicaid Services 2011a). Although rules for measuring and

distributing savings achieved by Medicare ACOs have been developed, the final rules for

the MSSP allude to the idea that Medicare policy toward ACOs is likely to evolve with

accumulated experience (Medicare Program 2011).

Outside of Medicare, many states are using, or publicly contemplating the use of,

different types of shared savings arrangements as part of their efforts to control costs and

improve the delivery of care within Medicaid (DeLia and Cantor 2012; Owen 2012;

Probert 2012; Burke 2013; McGinnis and Small 2012). Typically, the approach to incor-

porating shared savings within Medicaid is similar to that used in the MSSP but the details

underlying these approaches vary widely from state to state (DeLia and Cantor 2012;

Owen 2012; Probert 2012).

In the commercial insurance market as well, there is growing interest in the shared

savings concept. Although the details of commercial arrangements are typically not dis-

seminated widely, a number of recent reports have documented a variety of newly

emerging shared savings arrangements between providers and private insurers (Robinson

2011; Weissman et al. 2012; Bailit et al. 2012; Burke 2013). These developments have led

to enormous variability in shared savings measurement and a variety of approaches to the

shared distribution of savings.

Despite this variability, however, a number of common elements have emerged in the

development of shared savings mechanisms. One key element involves the demonstration

that savings (or losses) have actually occurred. Broadly speaking, savings are measured by

comparing per capita spending for a defined group of patients before and after the relevant

delivery reform (e.g., formation of an ACO) has been implemented.

A major concern in this process is the role played by random variation in per capita

healthcare spending. Specifically, this spending could rise or fall in a given year due to a

wide variety of random factors unrelated to the provider group’s care coordination. For

example, spending could fall if patients respond more favorably than usual to ordinary

treatment or if unusually sick patients from the baseline period return to more common

220 Health Serv Outcomes Res Method (2013) 13:219–240

123

utilization patterns (e.g. regression to the mean). Alternatively, spending may increase if

patients’ responses to treatments are less favorable than usual or there is a local epidemic

of infectious disease.

In response to random variation, shared savings arrangements often include a threshold

level of savings (e.g., 2 %) that must be observed before the observed savings are counted

as ‘‘real.’’ A key issue is where the threshold should be set. If it is set too low, then there is

a substantial risk that providers will receive rewards when no real savings are achieved.

But if the threshold is set too high, then there is a substantial risk that providers will fail to

be rewarded when real savings are achieved.

Clearly, lower thresholds are advantageous to providers, while higher thresholds are

advantageous to payers. While the exact threshold level might be negotiated by providers

and payers, it is important for both sides to have a clear understanding of how the relative

risks change in response to different threshold levels. To facilitate this understanding, it is

useful to frame these risks in terms of probabilities of Type I and Type II error in statistical

hypothesis testing. In this framework, the null hypothesis is that the delivery reform under

investigation has no effect on per capita healthcare spending.

A recent analysis of minimum savings thresholds in the MSSP finds that the proba-

bilities of Type II errors are often much larger than the corresponding probabilities of Type

I error errors, making the statistical risks faced by providers larger than those faced by the

Medicare program (DeLia et al. 2012). This prior analysis also suggests that the extent of

random variation in per capita healthcare spending may have been underestimated when

setting the minimum savings thresholds for the MSSP. Failure to adequately recognize and

respond to random variation in shared savings arrangements is an important problem that

can lead to misallocation of resources away from efficient healthcare providers and toward

inefficient ones, a process that would directly undermine the overarching goals of

healthcare payment and delivery reform.

The current paper extends the prior work by DeLia et al. (2012) in two fundamental ways.

First, it considers the role of random variation in per capita healthcare spending from a much

wider range of sources and ties this variation to specific design elements of shared savings

arrangements. Second, it develops a general modeling framework to encompass shared savings

arrangements that are currently evolving in Medicaid and commercial insurance in addition to

Medicare ACO programs. The analysis is informed by current and evolving practice in public

and private sector shared savings arrangements. Based on these arrangements, we construct

statistical models that are needed to separate true provider performance from random variation

and state explicitly the key assumptions that are needed to implement these models.

Data that could measure the performance of the shared savings arrangements described

above are not publicly available as many of these arrangements are in the initial imple-

mentation stages, while others are still in planning stages. Thus, the current analysis uses

analytic formulas and numerical simulations to derive the key findings of the paper. Based

on these findings, we discuss the data collection strategies that are needed to implement

and improve the statistical efficiency of each measured savings approach. We also discuss

the analytic and practical tradeoffs associated with these approaches and conclude with

points of departure for extended analyses.

2 Emerging approaches to savings measurement

Across the public and private sectors, the measurement of savings associated with delivery

reforms involves an assessment of the change in per capita spending within a defined

Health Serv Outcomes Res Method (2013) 13:219–240 221

123

patient population before and after the relevant delivery reform is implemented. To

determine whether the observed change is attributable to the relevant reform, savings

measurement schemes typically include a comparison population or a predetermined target

to which the provider group is held. As mentioned above, shared savings formulas also

often include a minimum savings threshold that must be crossed to ensure that observed

savings are not the result of random variation.

The MSSP, which allows Medicare to enter into shared savings agreements with ACOs,

has established a very detailed methodology for measuring savings (Medicare Program

2011). Specifically, the Center for Medicare and Medicaid Services (CMS) determines the

benchmark level of per capita spending within the ACO.1 This is done by taking a

weighted average of the most recent 3 years of per capita spending among patients who

would be assigned to the ACO according to pre-existing healthcare utilization patterns.2

The benchmark value is then ‘‘updated’’ using the projected absolute amount of growth in

national per capita expenditures for Parts A and B services under the original Medicare

FFS program. Average per capita expenditures within the ACO in the performance year are

then compared to the updated benchmark. If performance year expenditures are less than

this benchmark by a predetermined amount, then the ACO would be eligible for a financial

reward. The predetermined amount is based on a methodology to account for random

variation, which is described below. Similarly, in the two-sided model, if performance year

expenditures are sufficiently greater than the benchmark, then the ACO would pay a

penalty.

The Medicare approach is often used as a template for many emerging Medicaid and

commercial ACOs that are currently under development (DeLia and Cantor 2012; Owen

2012; Probert 2012; Weissman et al. 2012; Bailit et al. 2012; Bailit and Hughes 2011).

These additional ACO programs, however, often deviate from the Medicare approach in a

variety of important details (DeLia and Cantor 2012; Owen 2012; Probert 2012; Weissman

et al. 2012; Bailit et al. 2012; Bailit and Hughes 2011). These details include the use of

fixed versus projected spending targets, development of benchmark populations, use or

non-use of risk adjustment, limits on the minimum number of patients assigned to provider

groups, and topcoding spending for high-cost outlier patients.3

In the next section, we use the most common elements of savings measurement

approaches to develop a general statistical model that is sufficiently flexible to accom-

modate many of the variations in real world savings measurement schemes. We place

particular emphasis on the way in which the chosen scheme can create different sources of

random variation in savings measurement. With this model, we derive implications for the

likelihood of Type I and II errors when setting standards for the recognition of savings.

1 All spending amounts in the MSSP are risk adjusted using CMS’s Hierarchical Condition Categories,which were developed for risk adjustment of premiums paid to Medicare Advantage plans (Ash et al. 2000).2 In this weighting scheme, the most recent year is weighted at 0.6, the middle year at 0.3, and the earliestyear at 0.1. To adjust for medical cost inflation, the middle and least recent years are ‘‘trended forward’’using the national growth rate in Medicare Part A and B expenditures among fee-for-service beneficiariesnationally.3 Topcoding, sometimes called truncation, refers to the process where patients with spending above acertain threshold (e.g., $100,000) have their actual spending amount replaced with the threshold amount toreduce their influence on the calculated mean.


123

3 Sources of random variation

To understand the role of random variation in establishing savings, we build on and extend

the analytic framework developed by DeLia et al. (2012) in their more narrowly focused

assessment of the MSSP. Using their notation, we define the following key variables:

(1) Average per capita baseline spending �YB

(2) Adjustment factor reflecting the target growth in per capita spending A

(3) Average per capita performance year spending �YP

Using this notation, DeLia et al. (2012) define the average savings rate (ASR) as

ASR ¼ �YB þ Að Þ � �YP½ �=ð �YB þ AÞ: ð1ÞSavings are apparent if ASR [ 0, reflecting the fact that performance year spending is

less than baseline spending plus the allowed growth factor. Similarly, losses are apparent if

ASR \ 0. In the Medicare and other shared savings programs, ASR must cross a prede-

termined threshold (often based on the number of patients assigned to the provider group)

for apparent savings (or losses) to be recognized as real.

The final thresholds used in the MSSP are derived from a fairly rigid set of statistical

assumptions spelled out systematically by DeLia et al. (2012). The most important of these

assumptions in the context of this paper are: (1) inferences about the ASR are conditional

on observed values for �YB and (2) the adjustment factor A is known with certainty. Under

these assumptions, the only component of the ASR that is subject to random variation is�YP.

More specifically, �YP can be modeled as �YP ¼PN

iðlPþePiÞN

where lP is the true under-

lying mean in per capita spending in the performance period, ePi is the deviation from this

mean for patient i in the performance period, and N is the number of patients assigned to

the provider group. Although the expected value of �YP is lP (i.e., EðePiÞ ¼ 0) the actual

value of �YP will be different from lP by an amount that is unobservable to the payer and

provider group before they enter into a shared savings agreement. Nevertheless, for a

sufficiently large N, one can apply the Central Limit Theorem to �YP (or equivalently to �eP,

which is the mean of the ePi values that will ultimately be realized) to determine the

likelihood that �YP and lP will differ by a given amount. This idea forms the basis for

setting the minimum savings rates (MSRs) that the MSSP requires ACOs to achieve before

they are given credit for savings (DeLia et al. 2012).

It is worth noting that the sample size needed to reliably use the Central Limit Theorem

depends in part on the level of skewness in the underlying distribution from which

observations are drawn. This is a particular concern for healthcare expenditure data, which

is widely known to be heavily skewed. Yu and Machlin (2004) examined this issue using

data from the Medical Expenditure Panel Survey (MEPS). They found that confidence

intervals based on the normal distribution provide reliable coverage probabilities for

sample sizes that are at least 4,000. Since the smallest sample size we consider in our

analysis below (and in the paper by DeLia et al. 2012) is 5,000, use of the Central Limit

Theorem is justified. Moreover, in shared savings programs that topcode outlier levels of

spending, the skewness problem is reduced even further.

In this paper, we relax the two assumptions described above and consider the effects of

random variation in all three key elements of the ASR ( �YB, A, and �YP). Before doing so, it

is important to note that the sources of this variation are tightly linked to the specific design

features of shared savings arrangements. For example, if payers and providers can observe


123

baseline spending levels for the relevant group of patients, then the rules for measuring

savings on the basis of the ASR formula in Eq. 1 can be made conditional on observed

baseline spending. In other words, once observed, baseline spending �YBð Þ can be taken as

given and no longer subject to random variation.

In many instances, however, baseline spending is not likely to be given at the time when

stakeholders enter into shared savings arrangements for a variety of reasons. First, data lags

prevent the most timely spending data from being assembled and analyzed. Second, even if

providers have their own data in real time, they typically do not capture spending by their

patients in other healthcare settings. Third, depending on the design of the arrangement, the

composition of the patient group needed to calculate baseline spending may not be

observed until after the performance period has ended. This is the case under retrospective

patient assignment, which is mandatory in the MSSP and offered as an option for Medicare

Pioneer ACOs (Medicare Program 2011; Centers for Medicare & Medicaid Services

2011a). Specifically patients are assigned to ACOs based on the patients’ primary care use

patterns in the performance period. This group of patients and their corresponding

spending amounts clearly are not observable at baseline. In this case, random variation in�YB would need to be modeled with a framework similar to the one described for �YP (i.e.,

model �YB as �YB ¼PN

iðlBþeBiÞN

and apply the Central Limit Theorem).

Similarly, the adjustment factor (A) may or may not be a random variable depending on

the design of the shared savings arrangement. If A is based on contemporaneous spending

growth in a comparison population, then it will be a random variable. Conversely, if A is

clearly specified in advance, then it is deterministic. In practice, currently evolving shared

savings arrangements include a wide variety of random and deterministic adjustment

factors (Bailit and Hughes 2011; Bailit et al. 2012).

The effects of random variation are captured in the variance of the ASR. To calculate

this variance, we assume that that �YB is calculated from a random sample of N patients who

have a healthcare spending distribution with mean lB and variance r2B.4 Similarly, �YP is

based on the same group of N patients with spending mean lP and variance r2P. Since the

same patients appear in both periods, we use the parameter q to represent the correlation of

healthcare spending by the same patient over the two time periods and assume that q� 0,

which is commonly observed in practice. Finally, the random variable A has mean rep-

resented by lA and variance represented by r2A.

With this information, the variance of the ASR can be approximated using the Taylor

series linearization method (see Appendix) as given below:

V ASRð Þ ¼ ðr2P=NÞ

ðlB þ lAÞ2þ l2

Pðr2B=NÞ

ðlB þ lAÞ4þ l2

Pðr2A=NÞ

ðlB þ lAÞ4

þ 2�lP � covð�YP; �YBÞðlB þ lAÞ3

þ�lP � covð�YP;AÞðlB þ lAÞ3

þ l2P � covð�YB;AÞðlB þ lAÞ4

" # ð2Þ

4 In the MSSP, �YB is based on a 3-year weighted average, while in other arrangements, it may be based on anon-weighted average or a single year of data. Although these different approaches to baseline measurementcould be modeled within this analytic framework, they would add complexity without altering the mainconclusions of the paper.


123

Notice that if the only random variable is �YP, then the variance and covariance terms

involving �YB and A will equal zero, and thus, V ASRð Þ ¼ ðr2P=NÞ

ðlBþlAÞ2, which is the result derived

by DeLia et al. (2012).

If A is set in such a way that it is independent of the activities of the provider group

potentially eligible for shared savings (as is likely the case in most arrangements), then the

covariance terms involving A are equal to zero. Moreover, as shown in the Appendix,

cov �YP;�YB

� �¼ qrPrB=N. Thus, Eq. 2 can be rewritten as:


ðlB þ lAÞ2þ l2

Pðr2B=NÞ

ðlB þ lAÞ4þ l2

Pðr2A=NÞ

ðlB þ lAÞ4� 2

ðlPqrPrBÞ=N

ðlB þ lAÞ3

" #

ð3Þ

Here the effect of randomness in each of the key variables on V(ASR) becomes clear.

Random variation in A adds the terml2

Pðr2A=NÞ

ðlBþlAÞ4, which clearly increases V(ASR). Random

variation in �YB adds the terml2

Pðr2B=NÞ

ðlBþlAÞ4� 2

ðlPqrPrBÞ=N

ðlBþlAÞ3h i

: If q = 0, then this term clearly

increases VðASRÞ. However, if q is sufficiently large, then random variation in �YB could

decrease V(ASR).

Suppose, for example, that the provider group produces no savings and there is no

change in the variance of healthcare spending, which is in effect the null hypothesis that an

MSR threshold would be designed to confirm or reject. Then we can write lP ¼ lB þ lA

and r2P ¼ r2

B ¼ r2. Plugging this information into the term created by random variation in�YB shows that this term will be negative if and only if q[ 0.5 (see Appendix). This result

is consistent with a well established rule that pairing observations in an experimental

setting reduces variance if and only if the outcome correlation within subjects exceeds 0.5

(van Belle 2002).

4 Hypothesis testing with minimum savings thresholds

As shown above, random variability in the components of the ASR can add to its variance.

Greater variance increases the probability of Type I error (i.e., rewarding savings that are

apparent but not real) and Type II error (i.e., failing to reward true savings that are not

apparent). To illustrate these impacts quantitatively, we calculate the probabilities of both

types of error under three approaches that reflect the sources of random variation in the

ASR, which are directly influenced by the design and data availability for savings mea-

surement. The three approaches are as follows:

• Approach 1: Only per capita spending in the performance period ( �YP) is a random

variable, while baseline spending �YBð Þand the adjustment factor (A) are deterministic.

• Approach 2: Per capita spending at baseline and in the performance period are random

variables, while the adjustment factor is deterministic.

• Approach 3: Per capita spending at baseline and in the performance period as well as

the adjustment factor are all random variables.

To simplify the calculations, we consider only one-sided models.

In our framework, a provider group is eligible for shared savings if its ASR exceeds a

threshold k. For illustrative purposes, we use the thresholds set by the MSSP in our

calculations below (although different thresholds could easily be used in other


123

arrangements). Specifically, we set k equal to 0.039, 0.025, or 0.022 for provider groups

with 5,000; 20,000; or 50,000 assigned patients, respectively. These thresholds were

developed so that the probability of Type I error would be 0.10, 0.05, or 0.01 for ACOs

with 5,000, 20,000, or 50,000 patients, respectively, under the assumption that per capita

spending in the performance period is the only random variable.

To assess the impact of random variation on the probability of Type I and II error, we

assume that lP ¼ ð1� sÞðlB þ lAÞ: In calculating the Type I error probability, we con-

sider the null hypothesis that the provider group generates no savings or losses, which is

equivalent to setting s = 0. The calculation of Type II error probability requires a specific

alternative hypothesis. For illustrative purposes, we consider as the alternative hypothesis

the case where the provider group generates a moderate savings of 3.5 %, which is

equivalent to setting s = 0.035.

In Approaches 1 and 2 (where the adjustment factor is deterministic), A = lA with

certainty. In Approach 3 (where the adjustment factor is a random variable), we specify a

probability distribution for A that is centered around lA as described below.

Following the scenario analysis of DeLia et al. (2012), we assume that

lB þ lA ¼ $11; 762, which was the average level of per capita Medicare spending in

2010. To derive a value for lA, we calculate the annual growth in per capita Medicare

spending from 2000 to 2010 using information from the Centers for Medicare and

Medicaid Services (2011b). We find that this growth varied from a minimum of $228

in 2000 to a maximum of $1,334 in 2006. Thus, we set lA = 781, which is the

midpoint between these two numbers. In our analysis for Approach 3, we specify a

probability distribution for A with mean lA = 781 in a series of simulations described

in Sect. 4.2.

Since V(ASR) depends on the within-patient spending correlation coefficient q, we

consider different values of q in each approach. Specifically, we examine q values equal to

0 (indicating independence between �YB and �YP) as well as 0.33 and 0.67, which straddle the

key q value of 0.5 described above.

4.1 Random baseline with deterministic adjustment factor (Approaches 1 and 2)

To compare the probabilities of Type I and II error under Approaches 1 and 2, we derive

the corresponding probability distributions for ASR. We then use these distributions to

calculate the probabilities that ASR exceeds the given threshold k. To facilitate calcula-

tions, we assume that the variance in healthcare spending is the same in the baseline and

performance periods—i.e., r2P ¼ r2

B ¼ r2:

To derive the probability distribution for ASR under Approach 1, we rewrite Eq. 1 as

follows:

ASR ¼ 1� �YP=ð �YB þ AÞ: ð4Þ

Under Approach 1, �YB and A are fixed at lB and lA, respectively, and therefore,

E ASRð Þ ¼ 1� lP

lBþlA¼ 1� ð1�sÞðlBþlAÞ

ðlBþlAÞ¼ s: As shown above, V ASRð Þ ¼ ðr2=NÞ

ðlBþlAÞ2under

Approach 1. With this information, we can apply the Central Limit Theorem to �YP, which

means for a sufficiently large number of assigned patients, ASR will be approximately

normally distributed. As a result, we can calculate the probability that ASR exceeds the

threshold k in Approach 1 as (see Appendix):


123

P1 ¼ 1� Uk � sð ÞðlB þ lAÞ

r=ffiffiffiffiNp

� �

; ð5Þ

where U(�) is the cumulative distribution function for a standard normal variable. Under

Approach 2, we can apply the Central Limit Theorem to �YP and �YB, which makes ASR a

ratio of two normally distributed random variables. Using the approximation formulas

derived by Marsaglia (1965, 2006) for a ratio of normals (see Appendix), we calculate the

probability that ASR exceeds the threshold k in Approach 2 as follows:

P2 ¼ Us� kð ÞðlB þ lAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðr2=NÞðk2 þ 2ð1� kÞð1� qÞÞp

" #

; ð6Þ

when s = 0 (i.e., no real savings exist), then P1 and P2 measure the probabilities of Type I

error in Approaches 1 and 2, respectively. When s [ 0 (i.e., true savings do exist), then P1

and P2 measure the statistical power to detect savings in each approach, and therefore, the

probability of Type II error in Approach j is 1 - Pj.

To compare the statistical properties of the ASR under the two approaches, let x = s – k

and define f xð Þ ¼ P2 � P1: Then as shown in the Appendix,

f xð Þ ¼ U Jxð Þ � UðIxÞ; ð7Þ

where J ¼ ðlBþlAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðr2=NÞðk2þ2ð1�kÞð1�qÞÞp and I ¼ ðlBþlAÞ

r=ffiffiffiNp . The function f(x) can be used to compare

Approach 1 to Approach 2 in terms of Type I and II error probabilities for different

configurations of the true savings rate s, the threshold requirement k, and within-patient

spending correlation q. As shown below, much depends on whether s [ k, which would

make x [ 0 in Eq. 7 or whether the opposite is true. For example, if s = 0 and f(x) [ 0, then

the probability of Type I error is greater in Approach 2 compared to Approach 1. Alter-

natively, if s [ 0 and f(x) \ 0, then P2 \ P1, and therefore, 1 – P2 [ 1 – P1. In this case,

the probability of Type II error would be greater in Approach 2 compared to Approach 1.

A complete set of configurations with particular relevance for savings measurement

design is derived in the Appendix and shown in Table 1 (and derived in the Appendix).

Table 1 shows that no approach dominates the other in terms of statistical accuracy across

all configurations. Nevertheless, some configurations are either impossible or less likely

than others in real world shared savings arrangements.

First, consider probabilities of Type I error. When s = 0, the configuration s [ k is

impossible, since a savings threshold would never be negative. When s = k =0, then

according to Eqs. 5 and 6 the probability of Type I error is equal to 0.5 in both approaches.

If k [ s = 0, then the role of within-patient spending correlation q takes on particular

importance. If in this case, q = 0, then the last two rows of Table 1 would be irrelevant

since the condition k� 1 would never occur in practice, and thus, Approach 2 clearly

would be more prone to Type I error. The upper bound on k is even tighter than 1, since

MSR thresholds in practice rarely exceed 0.05 (Weissman et al. 2012). Thus, the last two

rows of Table 1 are irrelevant for Type I error if 1 - 2q [ 0.05, or q \ 0.475. Conversely,

the first two rows of Table 1 with respect to Type I error would be irrelevant if q [ 0.5,

since this would create the condition that k \ 0, which is also impossible in practice. Thus,

Approach 1 would produce greater Type I error in this case, which is consistent with the

statement above that Approach 1 has a larger value for V(ASR) when q[ 0.5. If

0:475�q� 0:5, then the relative probability of Type I error depends on where k is set

relative to q.


123

In terms of Type II error probability, there is no difference between Approaches 1 and 2

whenever s = k or k =1 - 2q. As discussed in the Type I error analysis above, if

q\ 0.475, then the last two rows of Table 1 would be irrelevant. In this case, the relative

probability of Type II error depends on whether the MSR threshold k exceeds the

underlying savings rate s. Specifically, Approach 1 would be more favorable to provider

groups that generate true savings falling between 0 and the MSR threshold. Approach 2 is

more favorable to provider groups that generate true savings above the MSR threshold.

Conversely, if q [ 0.5, then, as also described above, the first row of Table 1 is not

applicable. As a result, the relationship between s and k in predicting the relative Type II

error is reversed relative to the case where q \ 0.475. Similar to the Type I error analysis,

there is no clear prediction for relative Type II error when 0:475� q� 0:5.

Table 2 provides specific examples of relative Type I and II error probabilities using

MSR thresholds from the MSSP. In the cases where q = 0 or q = 0.33, the probability of

Type I error is always much greater in Approach 2. As expected, a large within-patient

spending correlation (q = 0.67) reverses this pattern so that the probability of Type I error

is always greater in Approach 1.

A qualitatively similar pattern appears for probabilities of Type II error for provider

groups with 20,000 or 50,000 assigned patients (i.e., error probabilities are higher in

Approach 2 for q = 0 or q = 0.33 but higher in Approach 1 for q = 0.67). The reverse

happens for provider groups with 5,000 assigned patients. This is because at 5,000 patients

the MSR threshold k is set at 0.039, which is greater than the underlying true savings rate

(s = 0.035) that we are considering in this example. We note as well that Type II error

probabilities are often much greater than Type I error probabilities under both approaches

regardless of the value for q.

4.2 Impact of random baseline values with random adjustment factor (Approach 3)

Given the complexity of the ASR when all three key variables are random, we use a

simulation methodology to assess the probabilities of Type I and II errors in Approach 3.

Using the Central Limit Theorem, �YB and �YP are jointly normally distributed with mean

vector ðlB; lPÞ and variance–covariance matrix V ¼ r2

N

1 qq 1

� �

. Based on the recent

growth in per capita Medicare spending described above, we assume that A is uniformly

distributed over the interval from 288 to 1,334. We conduct additional simulations

assuming that A is uniformly distributed over the interval from 661 to 881 to illustrate the

effects when A has a smaller variance but the same mean.

To conduct the simulations, we take random samples from the joint distribution of �YB

and �YP using the ‘‘drawnorm’’ command and a random sample from the distribution of

Table 1 Relative probabilities of Type I and II errors under Approaches 1 and 2

Approach with higher Type Ierror probability

Approach with higher Type IIerror probability

s \ k s = k s [ k s \ k s = k s [ k

k \ 1 - 2q Approach 2 Same N/A Approach 1 Same Approach 2

k = 1 - 2q Same Same N/A Same Same Same

k [ 1 - 2q Approach 1 Same N/A Approach 2 Same Approach 1


123

A using the ‘‘runiform’’ command in STATA 11.1. Following the methodology in Sect.

4.1, we conduct simulations assuming that providers have an assigned number of patients

equal to 5,000; 20,000; and 50,000.

Table 3 shows the results of the simulations. With small variance in A, the probabilities

of Type I error are consistently larger in Approach 3 relative to Approach 2 (which are

consistently larger relative to Approach 1), especially when q = 0.67. With large variance

in A, the probabilities of Type I error are larger still. For provider groups with 20,000 or

50,000 assigned patients, the probabilities of Type II error are also consistently larger in

Approach 3 relative to the corresponding probabilities in Approaches 1 and 2, with larger

differences occurring in the case of large variance in A. For provider groups with 5,000

assigned patients, the probability of Type II error is similar across all three approaches,

regardless of the variance in A. This result is largely influenced by the fact that the assumed

effect size (s = 0.035) is less than the applicable minimum savings threshold (k = 0.039).

It is also of note that the parameter q, which was so influential under Approach 2, has

essentially no influence on Type I or II error probabilities under Approach 3.

4.3 Risk adjustment

Shared savings arrangements vary extensively according to whether and how they incor-

porate direct risk adjustment methods into their measured savings calculations. For

example, a recent survey of key informants involved with 27 shared savings arrangements

found that one-half do not use any risk adjustment techniques at all (Bailit and Hughes

2011). According to that study, ‘‘Models electing not to use risk adjustments were more

likely to be medical home initiatives or believed that risk adjustment was not imperative

because the shared-savings model involved comparing the provider’s performance to its

own past performance, with an assumption that the patient population risk burden would

not vary much from year to year (Bailit and Hughes 2011).’’

Table 2 Probabilities of Type I and II errors with different sources of random variation in the averagesavings rate

Number ofpatients

MSRthreshold

Probability of Type I error Probability of Type II error

Approach 1 (%) Approach 2 (%) Approach 1 (%) Approach 2 (%)

q = 0

5,000 0.039 10 17.8 55.2 53.8

20,000 0.025 5 12.0 25.5 31.9

50,000 0.022 1 5.1 8.8 16.7

q = 0.33

5,000 0.039 10 12.9 55.2 54.6

20,000 0.025 5 7.5 25.5 28.3

50,000 0.022 1 2.3 8.8 11.9

q = 0.67

5,000 0.039 10 5.4 55.2 56.6

20,000 0.025 5 2.0 25.5 20.6

50,000 0.022 1 0.2 8.8 4.6


123

Shared savings arrangements that do incorporate risk adjustment involve a widely

varying and complex series of calculations. For example, the MSSP uses a complex risk

adjustment methodology that involves different risk adjustment factors for Medicare

subpopulations (e.g. dual eligibles), which are multiplied by per capita spending amounts

within these subpopulations and then aggregated to produce a final risk-adjusted per capita

spending amount (Centers for Medicare & Medicaid Services 2013; Kauter 2013).5 Pub-

licly available reports of other shared savings arrangements that do use risk adjustment

include very little detail regarding methodology (Bailit and Hughes 2011; Bailit et al.

2012; Weissman et al. 2012; Burke 2013). For these reasons, we are not able to directly

assess the impact of risk adjustment on random variation in the ASR in a general analytic

model.

We can, however, provide some illustration of how our modeling is affected by the use

of risk-adjusted per capita spending amounts. We do this by using an approach based on

observed-to-expected outcome ratios, which is a well established and widely used risk

adjustment framework (Iezzoni 2013). In this framework, risk-adjusted per capita spending

in the performance period �Y�P is written as:

�Y�P ¼ ð �YP= �EPÞ � �ZP; ð8Þ

where �EP is the average of the expected spending amounts derived from a multiple

regression model that uses patient diagnoses and demographic characteristics as inde-

pendent variables and �ZP is average per capita spending for a standard population (e.g., all

Medicare patients or all patients covered by a private insurer). Risk-adjusted spending in

the baseline period �Y�B is defined analogously. Replacing �YP and �YB with �Y�P and �Y�B,

Table 3 Probabilities of Type I and II errors with small and large variance in the adjustment factor for theaverage savings rate

Number ofpatients

MSRthreshold

Probability of Type I error Probability of Type II error

Small variancein A (%)

Large variancein A (%)

Small variancein A (%)

Large variancein A (%)

q = 0

5,000 0.039 17.6 22.5 54.2 54.2

20,000 0.025 12.5 24.8 32.0 39.4

50,000 0.022 6.1 26.1 18.1 35.5

q = 0.33

5,000 0.039 17.0 22.7 54.2 53.9

20,000 0.025 12.4 24.8 32.7 39.6

50,000 0.022 6.2 26.2 18.2 35.9

q = 0.67

5,000 0.039 17.0 21.7 54.7 51.9

20,000 0.025 12.6 24.5 32.4 39.3

50,000 0.022 6.2 26.8 18.1 35.9

5 Calculation of these risk adjustment factors includes rules that are designed to minimize ‘‘upcoding’’ ofdiagnoses during the performance period.


123

respectively, in Eq. 4 would give the risk adjusted version of ASR, which we refer to as

ASR*.

The added complexity makes closed form analysis of ASR* intractable. However, the

statistical properties of ASR* can be simplified and sources of its random variation min-

imized if several of its key components can be observed before the relevant parties enter

into a shared savings contract. Specifically, following the process outlined in Approach 1,

the baseline average expected per capita spending ( �EB) and per capita spending for the

standard population ( �ZB) could be calculated and known to all parties in advance. In

addition, the baseline standard population could be used as the performance year standard

population (i.e., set �ZP ¼ �ZB) making this latter quantity known in advance as well. Thus,

the only sources of random variation in ASR* would be �YP and �EP. In this case, a group of

payers and providers contemplating a shared savings arrangement could use available data

to estimate the statistical properties of �YP and �EP (e.g., means, variances) from prior

knowledge of �YB and �EB. This information would enable a series of scenario analyses

similar to what appears above to generate a common understanding of Type I and Type II

error risks embodied in a negotiated arrangement.

5 Discussion

5.1 Policy and practice implications

This paper provides a systematic analysis of the role of random variation in the mea-

surement of savings derived from improved healthcare coordination. It focuses specifically

on the three main components of the ASR, which is the basis for many shared savings

arrangements that are currently evolving in the public and private sectors (Medicare

Program 2011; Centers for Medicare & Medicaid Services 2011a; Weissman et al. 2012;

Bailit et al. 2012; Bailit and Hughes 2011; DeLia and Cantor 2012; Owen 2012; Probert

2012; Burke 2013; McGinnis and Small 2012).

This analysis has broad implications for the design of shared savings arrangements. In

addition, payers and providers could apply their own data to the models developed in this

paper to estimate relevant parameters and better anticipate the risks that would be incurred

under different shared savings arrangements that they are considering.

We find that random variation in per capita healthcare spending in the baseline and

performance periods and in the adjustment factor used to set a benchmark for spending

growth all contribute substantially to the probability that a provider group will be inap-

propriately rewarded or not rewarded for their performance in controlling healthcare

spending. In technical terms, we find that the probabilities of Type I and Type II errors

associated with common approaches to savings measurement can be quite high, often

exceeding 10 or 25 %, respectively. As a result, there is substantial risk that resources will

be misallocated under many current shared savings arrangements.

Intuitively, we find that the risks of both kinds of errors are reduced when the number of

patients assigned to the relevant provider group is increased. But as discussed previously in

the context of ACOs, the creation of shared savings agreements with very large provider

groups raises concerns about the creation of excess pricing power and the potential for anti-

trust violations (Richman and Schulman 2011). In addition, effective care coordination

may be much harder to achieve in very large groups of providers with no prior history of

collaborating on patient care. Very large groups are also more prone to free-riding by less


123

productive providers on the achievements of more productive ones. Moreover, even large

provider groups must contend with the limited precision of small patient samples if they

wish to allocate financial rewards to their provider subgroups on the basis of relative

contribution to overall healthcare spending reductions. Statistical precision from very large

patient panels is even less tenable in the case of PCMHs.

The risks of resource misallocation due to random variation can be reduced by con-

sidering these risks in the design of the savings measurement approach and by making

specific data available before implementing a shared savings agreement. For example, we

found that the probability of Type I error is much smaller when the adjustment factor for

setting the benchmark for spending growth is deterministic instead of random. With only a

few special exceptions, we find that the probability of Type II error would also be smaller.

The adjustment factor can be made deterministic by simply specifying it in advance at the

beginning of the shared savings agreement. For example, the payer and provider group

might agree that providers receive credit for savings if per capita healthcare spending

grows by less than $300 in the performance year. Knowledge of the spending goal with

certainty could carry additional advantages as well. With a predetermined benchmark,

providers would have a clear target on which to focus their efforts and would be in a better

position to evaluate and calibrate their progress before the end of the performance year.

These advantages, however, would have to be viewed alongside the possibility that the

predetermined benchmark is set too high or too low relative to what other providers outside

the shared savings arrangement are able to achieve.

With a predetermined value for benchmark spending growth, we find that the proba-

bilities of Type I and II errors might be reduced even further if the correlation of healthcare

spending by the same patients over time is especially strong (i.e., more than 0.5) and

incorporated into the measured savings formula.

The use of within-patient spending correlation to improve the statistical precision of the

ASR carries the implicit assumption that the same patients appear in the baseline and

performance periods, an assumption that we have made in the modeling above. In some

real-world shared savings arrangements, this assumption is clearly satisfied as savings are

measured only for groups of patients that meet this criterion (Weissman et al. 2012; Bailit

and Hughes 2011; Bailit et al. 2012). But in other arrangements, such as the MSSP, a

slightly different group of patients may appear in the baseline and performance periods

making it more difficult to exploit the information available from the within-patient

spending correlation.

In contrast, given a small within-patient spending correlation, savings can be measured

with greater statistical precision if performance year spending is the only source of random

variation. For this to be the case, the payer and provider group would need to know the

baseline values of per capita healthcare spending with certainty before entering into the

shared savings agreement. This might be done, for example, with claims data for a clearly

defined set of patients for whom savings will be measured.

The incorporation of these principles into real-world shared savings arrangements

depends crucially on methods used to assign patients to providers. Although many patient

assignment algorithms exist (Mehrotra et al. 2010; Pope 2011; Pantely 2011; Lewis et al.

2013), the key issue in this paper is whether the patient assignment method allows for the

measurement of key parameters in the ASR formula before the shared savings arrangement

is implemented. For example, the MSSP assigns patients to ACOs retrospectively at the

end of the performance period (Medicare Program 2011). Since the patient population is

not known until the end of the performance period, it is impossible to assess variation in

the ASR conditional on observed baseline spending at the beginning of the contract period.


123

In contrast, providers in the Medicare Pioneer ACO Program can choose prospective

assignment where the patient base is known at the beginning of the contract period

(Centers for Medicare & Medicaid Services 2011a). Similarly, many commercial ACOs

and PCMHs use prospective assignment as well (Weissman et al. 2012; Bailit and Hughes

2011; Bailit et al. 2012).

Another variation, used by Medicaid ACO programs in New Jersey and Colorado,

assigns patients to providers geographically based on patient residence (New Jersey Public

Law 2011; McGinnis and Small 2012). Although the geographic boundaries in these

arrangements are well defined, population mobility (and mortality) creates a situation

where the baseline and performance year populations can be somewhat different. Never-

theless, geographic assignment clearly makes it possible to calculate baseline per capita

spending in a way that it is known to participants before the shared savings contract period.

Patient assignment also affects the extent to which there may be endogenous patient

switching between provider networks in ways that may be related to healthcare quality or

spending per patient. For example, with retrospective assignment, patients who receive

poor service from a given provider network, may shift the majority of their care to other

providers and thus would no longer be assigned to the original network. With prospective

or geographic assignment, providers would still be held responsible for patients who were

originally assigned to them (and who remain in the relevant service area), even if these

patients decided to receive care elsewhere.

One of the driving factors behind the probabilities of Type I and II errors in our analysis

is the minimum savings rate (MSR), which is the threshold that a provider group’s mea-

sured ASR must cross before the group is credited for producing savings. We observe in

particular that the relationship between statistical precision and the sources of random

variation in the ASR depends on whether the MSR threshold is above or below the true

ASR. This observation raises the more general question about how the MSR threshold

should be set in the first place. Our numerical analysis is based on MSR thresholds set by

the Medicare Shared Savings Program (MSSP), which implicitly assumes that the only

source of random variation in the ASR derives from performance year per capita healthcare

spending (Medicare Program 2011). These thresholds were intended to achieve specified

levels of Type I error with no direct consideration of Type II error (Medicare Program

2011).

The analyses in this paper suggest that more explicit attention to the tradeoff between

these two errors in the setting of MSR thresholds is warranted. At issue is the relative cost

of rewarding a provider group that did not achieve any savings versus denying a reward to

a provider group that did achieve savings. Part of these costs might involve what provider

groups are expected to do with the rewards that they receive. For example, in the NJ

Medicaid ACO demonstration, ACOs are required to reinvest any rewards they earn into

future care and access improvements (New Jersey Public Law 2011). Therefore, payments

made to providers as a result of Type I error might still produce some population health

benefits (albeit at higher costs) even if awarded erroneously. In other shared savings

arrangements, financial rewards might be used to enhance provider income. While this

second use of financial rewards is appropriate in certain contexts, it creates different risks

in terms of resource misallocation in the presence of Type I error.

Similarly, provider groups with very little experience in risk-based contracting, par-

ticularly relatively smaller ones, may be ill-equipped to deal with even modest probabil-

ities of Type II error in their early stages of shared savings activities. Thus, some payers

may find it useful in the early stages of shared savings arrangements to minimize the

probability of Type II error (exposing themselves to greater risk of Type I error) until their


123

contracting providers gain more experience with the management of risks associated with

shared savings. This a philosophy that some payers have adopted as part of a longer term

strategy to get more providers interested in incentive-based payment models (DeLia and

Cantor 2012; Weissman et al. 2012).

5.2 Analytic extensions

Our analysis of measured savings methodology can be extended to incorporate specific

features that are unique to the many different arrangements that are evolving. For example,

in many of these arrangements, high-cost outliers are topcoded at a threshold level such as

the 99th percentile or a specific dollar figure ranging from $50,000 to $500,000 (Bailit et al.

2012; Medicare Program 2011; Weissman et al. 2012). (If spending is topcoded, for

example, at $100,000, then all patients with spending above this amount would have their

spending recorded as $100,000.) Analytically, topcoding has the effect of reducing both the

mean and the variance of per capita healthcare spending, although the effect tends to be

greater on the variance (Thomas et al. 2004; Iezzoni 2013). While this reduced variance

would lead to the desirable property of lower variance in the ASR, and therefore, lower

probabilities of Type I and II errors in the identification of savings, censoring high-cost

outliers has additional implications. Specifically, provider groups that focus on so-called

‘‘super-user’’ populations with extremely high utilization levels might not receive credit for

successful interventions if censoring is in place (Gawande 2011; Linkins et al. 2008; Okin

et al. 2000; Sadowski et al. 2009).

Although our analysis compares a single-year performance period with a single-year

baseline, other comparison schemes are sometimes used in shared savings arrangements.

Most prominently, the MSSP uses a three-year weighted average for baseline healthcare

spending with greater weight given to more recent years (Medicare Program 2011). Other

arrangements use a baseline period from 6 months to 2 years and a performance period

from 6 to 12 months (Weissman et al. 2012; Bailit et al. 2012). In these cases, the general

principles derived in this paper would remain the same but the calculation of within-patient

spending correlation and other relevant parameter estimates would have to be adjusted to

account for the specific lengths of baseline and performance periods.

We note as well that random variation could affect the statistical precision of healthcare

quality measures, which also form an important part of shared savings arrangements.

Achieving a desired level of statistical precision involves additional challenges when

specific measures apply only to a subset of the assigned patient population (e.g., percentage

of diabetics with properly controlled hemoglobin A1c). The nuances of statistical precision

in healthcare quality measures, though important, are beyond the scope of this paper.

Another important extension concerns the amount of over or underpayment that occurs

as a result of Type I or Type II error, respectively. Conditional on the occurrence of such

an error, the amount of over or underpayment depends on the number of patients covered

under the shared savings agreement and the underlying savings rates. It also dependents on

the idiosyncrasies of the specific arrangement that is implemented such as the shares of

measured savings that are distributed to payers versus providers and the use of fist-dollar

provider payments versus payments above a fixed savings threshold. Since Type I and II

errors are necessary conditions for these kinds of over and underpayments to occur, the

main conclusion of the paper still holds—namely, that these types of misallocations can be

minimized through careful design of the savings measurement methodology as described

above.


123

In practice, payers and providers may use other contractual arrangements to hedge

against the risks from random variation. These include reinsurance purchased by providers

or, as recently suggested, options pricing by payers (Friedberg 2013). The analysis above

provides a framework for pricing the relevant risks that would determine the prices and

other details of these hedging activities. It is important to note, however, that the use of

these risk hedging mechanisms would add to the administrative costs of executing shared

savings arrangements. This problem could be especially acute for smaller provider groups

for whom administrative costs and premiums connected with reinsurance may be higher

relative to larger provider groups. Thus, it would remain important to minimize the

underlying risks as discussed throughout this paper.

Similar risk hedging issues arise for providers engaged in multiple shared savings

arrangements with different payers. In theory, random underpayment from one arrange-

ment might coincide with random overpayment from another, leaving the provider group

approximately held harmless. In practice, however, one cannot guarantee that the relative

risks would be inversely correlated across shared savings arrangements. To the contrary, in

a small to medium sized provider group, consistent but moderate success in reducing

spending could go undetected across multiple arrangements if all such arrangements are

statistically noisy (e.g., as in Approach 3 described above).6

5.3 Caveats and limitations

Our analysis should be considered in light of some caveats and limitations. First, we

considered only one-sided savings models in our calculations. While this focus does not

affect our calculations of ASR variance under different measured savings approaches, the

probabilities of misclassified provider performance would clearly be different in two-sided

arrangements.

At various points of the analysis, we rely heavily on the Central Limit Theorem to justify

our use of the normal distribution in our calculations. Because healthcare spending data is

highly skewed, use of the Central Limit Theorem requires larger samples than would be the

case for samples from more symmetrical distributions. In our calculations, we use sample

sizes of 5,000 or higher, which have been shown to be adequate for skewed data (Yu and

Machlin 2004). Nevertheless, some shared savings arrangements in practice involve ACOs

and PCMHs with smaller numbers of patients. In these instances, the risks of Type I and II

errors are likely higher than those reported here and must be assessed directly without the use

of large sample approximations. Also, we relied on approximation formulas for ratios of

normal variables, which can be sensitive to skewness, and additionally, correlation between

the relevant numerators and denominators. Although we do not expect these considerations to

add any systematic bias to our comparisons of different measured savings approaches, they

remain as issues in need of further analysis in future work.

Though not applied universally, some shared savings arrangements use risk-adjusted

measures of per capita healthcare spending (Weissman et al. 2012; Bailit and Hughes 2011;

Bailit et al. 2012). The great variety and complexity of risk adjustment methods that are used in

practice make the inclusion of risk adjustment in our models intractable. Although we outlined

some of the major considerations regarding risk adjustment and random variation in the ASR,

the role risk adjustment in this regard remains an important area for future investigation.

6 In theory, statistical noise in this situation might be substantially reduced if all payers contracting with agiven provider group pooled their data to measure savings. Many practical considerations (e.g., coordinationcosts, competition), however, would like prevent this from happening in practice.


123

Our numerical calculations are based on MSR thresholds used by the MSSP and on

parameter assumptions used in a recent analysis of that particular program (DeLia et al.

2012). A different combination of MSR thresholds and parameter values would have

produced different probabilities of Type I and II errors. In particular, these probabilities

would have been smaller in populations with smaller healthcare spending variance.

Nevertheless, the general relationships among the ASR measurement approaches derived

in this paper are likely to hold across a wide variety of MSR thresholds and parameter

values.

In our analysis of random variation in the adjustment factor for baseline spending, we

used a simulation methodology that required specific assumptions about the probability

distribution for annual growth in per capita healthcare spending. It would be useful in

future research to examine the robustness of Type I and II error probabilities to different

assumptions about this probability distribution.

6 Conclusion

This paper shows how random variation can lead to large probabilities of Type I and II

errors in the recognition of savings that may or may not be generated by new forms of

patient care management. It also suggests ways in which savings measurement rules can be

designed to limit this random variation. Although such rules must be designed to meet

other implementation goals, our analysis provides insights and general formulas that can be

used to guide the design of measured savings approaches across a wide range of shared

savings arrangements that are currently evolving in the American health sector, which has

just entered a period of experimentation and refinement in terms of measuring and

rewarding savings that are generated from improved care coordination.

Acknowledgments This research was supported by the Agency for Healthcare Research & Quality (Grantno. R24-HS019678). Helpful comments were provided by Sujoy Chakravarty and two anonymousreviewers.

Appendix

This appendix provides proofs of mathematical statements that are used in the main text.

Statement A1


ðlB þ lAÞ2þ l2

Pðr2B=NÞ

ðlB þ lAÞ4þ l2

Pðr2A=NÞ

ðlB þ lAÞ4

þ 2�lP � covð �YP; �YBÞðlB þ lAÞ3

þ�lP � covð�YP;AÞðlB þ lAÞ3

þ l2P � covð�YB;AÞðlB þ lAÞ4

" #

Proof To apply the needed formulas more easily, let x1 ¼ �YP, x2 ¼ �YB, and x3 ¼ A and

let li and r2i be the mean and variance for xi(i = 1, 2, 3). Then we can write the ASR as

ASR ¼ f ðx1; x2; x3Þ ¼ x2þx3ð Þ�x1

ðx2þx3Þ and apply the multivariate Taylor series method (Casella

and Berger 2002) to approximate the variance of f ðx1; x2; x3Þ as


123

V f x1; x2; x3ð Þð Þ ¼X3

i¼1

X3

j¼1

fiðl1; l2; l3Þ � fjðl1; l2; l3Þ � covðxi; xjÞ

where fiðl1; l2; l3Þ is the first partial derivative of f with respect to xi evaluated at the point

ðl1; l2; l3Þ. Taking derivatives and rearranging terms gives:

V f x1; x2; x3ð Þð Þ ¼ r21

l2 þ l3ð Þ2þ l2

1r22

l2 þ l3ð Þ4þ l2

1r23

l2 þ l3ð Þ4

þ 2�l1covðx1; x2Þ

l2 þ l3ð Þ3þ�l1covðx1; x3Þ

l2 þ l3ð Þ3þ�l2

1covðx2; x3Þl2 þ l3ð Þ4

" # . Rewriting

the xi, li, and r2i terms in their original form gives the result.

Statement A2

cov �YP;�YB

� �¼ qrPrB=N

Proof By definition, q ¼ covðYBj ;Y

Pj Þ

rBrPwhere YB

j is the level of health spending for person j in

the baseline period and YPj is the level of health spending for person j in the performance

period. Also by definition, cov �YP;�YB

� �¼ E �YB � lBð Þð �YP � lPÞ½ �. But this can be rewritten

as E 1N

� �RiY

Bi � lB

� �1N

� �RiY

Pi � lP

� �� ¼ E 1

N

� �Riy

Bi

1N

� �Riy

Pi

� �where yB

i is the deviation of

YBi from its mean and yP

i is the deviation of YPi from its mean. Therefore, we can write

cov �YP;�YB

� �¼ 1

N2

� �E yB

1 þ . . .þ yBN

� �ðyP

1 þ . . .þ yPNÞ

� �¼ 1

N2

� �Ncov YB

j ; YPj

¼ qrPrB=N:

The second to last step derives from the assumptions that cov YBj ; Y

Pk

¼ 0; 8k 6¼ j and

that cov YBj ; Y

Pj

is the same for every patient j.

Statement A3 Suppose r2P ¼ r2

B ¼ r2. Then under the null hypothesis that

lP ¼ lB þ lA,l2

Pðr2B=NÞ

ðlBþlAÞ4� 2

ðlPqrPrBÞ=N

ðlBþlAÞ3h i

\0 if and only if q[ 0.5.

Proof Using the assumption that r2P ¼ r2

B ¼ r2, the term of interest can be rewritten aslPr2

NðlBþlAÞ3lP

lBþlA� 2q

h i. This term will be negative if and only if

lP

lBþlA� 2q

h i\0. Under the

null hypothesis, this means that (1 - 2q) \ 0, which occurs if and only if q[ 0.5.

Statement A4

P1 ¼ 1� Uk � sð ÞðlB þ lAÞ

r=ffiffiffiffiNp

� �

Proof By definition P1 ¼ PðASR [ kÞ under Approach 1. Putting ASR into standardized

form, we have

P ASR [ kð Þ ¼ PASR� E ASRð Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV ASRð Þ

p [k � E ASRð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV ASRð Þ

p

!

¼ 1� Uk � E ASRð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV ASRð Þ

p

!

¼ 1� Uk � sð ÞðlB þ lAÞ

r=ffiffiffiffiNp

� � .


123

Statement A5

P2 ¼ Us� kð ÞðlB þ lAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðr2=NÞðk2 þ 2ð1� kÞð1� qÞÞp

" #

Proof As described in the main text, under Approach 2, ASR is the ratio of two

normally distributed random variables Z ¼ �YB þ Að Þ � �YP and W ¼ ð�YB þ AÞ. Although

the distribution of Z/W is fairly complex, it can be greatly simplified if we assume that

P(W [ 0) = 1 (Marsaglia 1965, 2006). This assumption is clearly innocuous in the

shared savings context, as per capita health spending is always greater than zero. To

determine the likelihood that a provider group would be rewarded for savings, we must

calculate the probability that ZW

[ k for a given MSR threshold k. If we define a new

random variable Q = kW - Z, we can apply the formula developed by Marsaglia

(1965, 2006) to calculate the required probability as

P2 ¼ P ASR [ kð Þ ¼ P Q\0ð Þ ¼ Uð�lQ=rQÞ, where lQ and rQ are the mean and

standard deviation of Q. Through direct substitution and calculation, it can be shown

that �lQ ¼ s� kð ÞðlB þ lAÞ and rQ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðr2=NÞ½k2 þ 2ð1� kÞð1� qÞ�

p, which com-

pletes the proof.

Statement A6 Let x = s - k and define f(x) = P2 – P1. Then f xð Þ ¼ U Jxð Þ � UðIxÞ,where J ¼ ðlBþlAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðr2=NÞðk2þ2ð1�kÞð1�qÞÞp and I ¼ ðlBþlAÞ

r=ffiffiffiNp .

Proof Using the definitions of J and I and Statements A4 and A5, we have

f xð Þ ¼ U Jxð Þ � 1� Uð�IxÞ½ �. Using the symmetry of the standard normal distribution,

U �Ixð Þ ¼ 1� UðIxÞ. Therefore, f xð Þ ¼ U Jxð Þ � UðIxÞ as required.

Statement A7 The relationships listed in Table 1 in the main text follow directly from

the properties of f(x).

Proof First observe that f 0ð Þ ¼ U 0ð Þ � U 0ð Þ ¼ 0:5� 0:5 ¼ 0: Now suppose that x [ 0.

If k \ 1 -2q, then it can be shown with straightforward algebra that I [ J, and therefore,

f(x) \ 0 [since U(�) is monotonically increasing]. Similarly, if k [ 1 -2q, then f(x) [ 0.

Now suppose to the contrary that x \ 0. If k \ 1 - 2q, then I [ J ) Ix\Jx. Then

f(x) [ 0. Similarly, if k [ 1 -2q, then by similar reasoning f(x) \ 0. These relationships

are summarized in the Table 4.

The contents of Table 1 in the main text can be derived from Table 4 by using the

formulas x = s - k and f(x) = P2 – P1.

Table 4 Values of f(x) under alternative assumptions

x \ 0 x = 0 x [ 0

k \ 1 - 2q f(x) [ 0 f(x) = 0 f(x) \ 0

k = 1 - 2q f(x) = 0 f(x) = 0 f(x) = 0

k [ 1 - 2q f(x) \ 0 f(x) = 0 f(x) [ 0


123

References

Ash, A.S., Ellis, R.P., Pope, G.C., Ayanian, J.Z., Bates, D.W., Burstin, H., Lezzoni, L.I., MacKay, E., Yu,W.: Using diagnoses to describe populations and predict costs. Health Care Financ. Rev. 21(3), 7–28(2000)

Bailit, M., Hughes, C.: Key design elements of shared-savings payment arrangements. CommonwealthFund, New York (2011)

Bailit, M., Hughes, C., Burns, M., Freedman, D.H.: Shared-savings payment arrangements in health care: sixcase studies. Commonwealth Fund, New York (2012)

Burke, G.: Moving toward accountable care in New York. United Hospital Fund, New York (2013)Casella, G., Berger, R.L.: Statistical inference, 2nd edn. Duxbury, Pacific Grove (2002)Centers for Medicare & Medicaid Services: Medicare shared savings program: shared savings and losses

and assignment methodology specifications, Version 2 April. CMS, Baltimore (2013)Centers for Medicare & Medicaid Services: Pioneer Accountable Care Organization (ACO) model request

for application. CMS, Baltimore (2011)Centers for Medicare & Medicaid Services: Health expenditures by state of residence. Centers for Medicare

& Medicaid Services. http://www.cms.gov/NationalHealthExpendData/downloads/resident-state-estimates.zip (2011b). Accessed 20 Dec 2012

DeLia, D., Cantor, J.C.: Recommended approach for calculating savings in the NJ Medicaid ACO dem-onstration project. Rutgers Center for State Health Policy, New Brunswick (2012)

DeLia, D., Hoover, D., Cantor, J.C.: Statistical uncertainty in the Medicare Shared Savings Program.Medicare Medicaid Res. Rev. 2(2), E1–E15 (2012)

Friedberg, M.: Option pricing: a flexible tool to disseminate shared savings contracts. Presented at theAcademy Health annual research meeting (2013)

Gawande, A.: The hot spotters: can we lower medical costs by giving the neediest patients better care? TheNew Yorker. http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_gawande?currentPage=all (2011). Accessed 31 July 2013

Iezzoni, L. (ed.): Risk adjustment for measuring health care outcomes, 4th edn. Health Administration Press,Chicago (2013)

Kauter, J.: Risk adjustment in the medicare ACO shared savings program. Presented at the Academy Healthannual research meeting (2013)

Lewis, V.A., McClurg, A.B., Smith, J., Fisher, E.S., Bynum, J.P.W.: Attributing patients to accountable careorganizations: performance year approach aligns stakeholders’ interests. Health Aff. (Millwood) 32(3),587–594 (2013)

Linkins, K.W., Brya, J.J., Chandler, D.W.: Frequent users of health services initiative: final evaluationreport. Lewin Group, Falls Church (2008)

Marsaglia, G.: Ratios of normal variables and ratios of sums of uniform variables. J. Am. Stat. Assoc.60(309), 193–204 (1965)

Marsaglia, G.: Ratios of normal variables. J. Stat. Softw. 16(4), 1–10 (2006)McGinnis, T., Small, D.M.: Accountable care organizations in medicaid: emerging practices to guide

program design. Center for Health Care Strategies Inc., Hamilton (2012)Medicare Program; Medicare Shared Savings Program: Accountable Care Organizations. 76 Fed. Reg.

67802 (to be codified at 42 C.F.R. pt. 425) (2011)Mehrotra, A., Adams, J.L., Thomas, J.W., McGlynn, E.A.: The impact of different attribution rules on

individual physician cost profiles. Ann. Intern. Med. 152(10), 649–654 (2010)New Jersey Public Law: Medicaid accountable care demonstration project, Chapter 114 (2011)Okin, R.L., Boccellari, A., Azocar, F., Shumway, M., O’Brien, K., Gelb, A., Kohn, M., Harding, P.,

Wachsmuth, C.: The effects of clinical case management on hospital service use among ED frequentusers. Am. J. Emerg. Med. 18(5), 603–608 (2000)

Owen, R.: Health care delivery systems demonstration project. Presented at the seventh national medicaidcongress (2012)

Pantely, S.E.: Whose patient is it? Patient attribution in ACOs. Milliman healthcare reform briefing paper (2011)Pope, G.: Attributing patients to physicians for pay for performance. Chapter 7. In: Cromwell, J., Triolini,

M.G., Pope, G., Mitchell, J.B., Greenwald, L.M. (eds.) Pay for performance in health care: methodsand approaches. RTI Press Publication No. BK-002-1103. RTI Press, Research Triangle Park (2011)

Probert, M.: Maine care: accountable communities initiative. Presentation at the seventh national medicaidcongress (2012)

Richman, B.D., Schulman, K.A.: A cautious path forward on accountable care organizations. JAMA 305(6),602–603 (2011)


123

http://www.cms.gov/NationalHealthExpendData/downloads/resident-state-estimates.zip

http://www.cms.gov/NationalHealthExpendData/downloads/resident-state-estimates.zip

http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_gawande?currentPage=all

http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_gawande?currentPage=all

Robinson, J.C.: Accountable care organization for PPO patients: challenge and opportunity in California.Integrated Healthcare Association, Oakland (2011)

Sadowski, L.S., Kee, R.A., VanderWeele, T.J., Buchanan, D.: Effect of a housing and case managementprogram on emergency department visits and hospitalizations among chronically ill homeless adults: arandomized trial. JAMA 301(17), 1771–1778 (2009)

Thomas, W.J., Grazier, K.L., Ward, K.: Comparing accuracy of risk-adjustment methodologies used ineconomic profiling of physicians. Inquiry 41(2), 218–231 (2004)

van Belle, G.: Statistical rules of thumb. Wiley, New York (2002)Weissman, J.S., Bailit, M., D’Andrea, G., Rosenthal, M.B.: The design and application of shared savings:

lessons from early adopters. Health Aff. (Millwood) 31(9), 1959–1968 (2012)Yu, W.W., Machlin, S.: Examination of skewed health expenditure data from the medical expenditure panel

survey (MEPS). Working Paper No. 04002. Agency for Healthcare Research and Quality, Rockville(2004)


123

Documents

Leaving it to chance: the effects of random variation in shared savings arrangements