Upload
derek
View
212
Download
0
Embed Size (px)
Citation preview
Leaving it to chance: the effects of random variationin shared savings arrangements
Derek DeLia
Received: 22 December 2012 / Revised: 1 August 2013 / Accepted: 30 October 2013 /Published online: 10 November 2013� Springer Science+Business Media New York 2013
Abstract Shared savings arrangements are designed to financially reward provider
groups that reduce healthcare spending through improved care coordination. A major
concern with these arrangements is that annual changes in spending are subject to a variety
of random factors that are unrelated to care coordination efforts. As a result, resources can
be misallocated if providers who are unsuccessful at controlling spending are inappro-
priately rewarded and providers who are successful are inappropriately denied rewards.
This paper provides a systematic analysis of the role of random variation using a general
statistical model based on shared savings arrangements that are currently evolving in the
public and private sectors. The model focuses specifically on the variance of the average
savings rate (ASR), which is the quantity used to determine whether and by how much a
provider group will be rewarded. Variance in the ASR is a major driver of the probabilities
of Type I error (i.e., inappropriately rewarding providers) and Type II error (i.e., inap-
propriately failing to reward providers), which can lead to major resource misallocations.
We find that the probabilities of Type I and Type II errors associated with common
approaches to savings measurement can be quite high, often exceeding 10 or 25 %,
respectively. We also find that the likelihood of both types of errors can be substantially
reduced through careful planning and design of savings measurement schemes before
payers and providers enter into shared savings agreements.
Keywords Shared savings � Healthcare spending � Type I and Type
II errors � Sample size � Accountable care � Patient centered medical homes
1 Introduction
Many are looking to shared savings arrangements as a way to slow the growth in healthcare
spending while simultaneously improving the quality of care. Under such arrangements,
D. DeLia (&)Rutgers Center for State Health Policy, 112 Paterson St., Room 540, New Brunswick, NJ 08901, USAe-mail: [email protected]
123
Health Serv Outcomes Res Method (2013) 13:219–240DOI 10.1007/s10742-013-0110-9
provider groups that improve the coordination and delivery of care in a way that reduces
per patient spending below a targeted amount would be given a share of these savings.
Shared savings are often tied to healthcare delivery reforms such as Patient Centered
Medical Homes (PCMHs) and Accountable Care Organizations (ACOs).
Shared savings models vary in the amount of financial risk that providers are expected
to bear. In one-sided models, provider groups are rewarded for savings but are not
penalized if spending grows faster than the targeted amount. In two-sided models, provider
groups agree to pay penalties for larger-than-allowed spending increases in exchange for
the opportunity to earn greater rewards from savings. Some view shared savings
arrangements as a temporary bridge between prevailing fee-for-service reimbursement and
more prospective payment mechanisms such as bundled payments and global capitation,
which place greater financial risk onto providers (Robinson 2011; Weissman et al. 2012).
Nevertheless, this transition may take many years to unfold across the entire U.S. health
system. As a result, shared savings mechanisms are likely to evolve and remain in the
health system for many years to come.
A number of shared savings arrangements have been introduced into Medicare as part of
the Patient Protection and Affordable Care Act (PPACA). Under the Medicare Shared
Savings Program (MSSP), Medicare will enter into shared savings arrangements with
newly formed ACOs (Medicare Program 2011). Shared savings are also integral to
Medicare’s Pioneer ACO Program in which providers that already have substantial
experience in care coordination enter into more sophisticated shared savings arrangements
(Centers for Medicare & Medicaid Services 2011a). Although rules for measuring and
distributing savings achieved by Medicare ACOs have been developed, the final rules for
the MSSP allude to the idea that Medicare policy toward ACOs is likely to evolve with
accumulated experience (Medicare Program 2011).
Outside of Medicare, many states are using, or publicly contemplating the use of,
different types of shared savings arrangements as part of their efforts to control costs and
improve the delivery of care within Medicaid (DeLia and Cantor 2012; Owen 2012;
Probert 2012; Burke 2013; McGinnis and Small 2012). Typically, the approach to incor-
porating shared savings within Medicaid is similar to that used in the MSSP but the details
underlying these approaches vary widely from state to state (DeLia and Cantor 2012;
Owen 2012; Probert 2012).
In the commercial insurance market as well, there is growing interest in the shared
savings concept. Although the details of commercial arrangements are typically not dis-
seminated widely, a number of recent reports have documented a variety of newly
emerging shared savings arrangements between providers and private insurers (Robinson
2011; Weissman et al. 2012; Bailit et al. 2012; Burke 2013). These developments have led
to enormous variability in shared savings measurement and a variety of approaches to the
shared distribution of savings.
Despite this variability, however, a number of common elements have emerged in the
development of shared savings mechanisms. One key element involves the demonstration
that savings (or losses) have actually occurred. Broadly speaking, savings are measured by
comparing per capita spending for a defined group of patients before and after the relevant
delivery reform (e.g., formation of an ACO) has been implemented.
A major concern in this process is the role played by random variation in per capita
healthcare spending. Specifically, this spending could rise or fall in a given year due to a
wide variety of random factors unrelated to the provider group’s care coordination. For
example, spending could fall if patients respond more favorably than usual to ordinary
treatment or if unusually sick patients from the baseline period return to more common
220 Health Serv Outcomes Res Method (2013) 13:219–240
123
utilization patterns (e.g. regression to the mean). Alternatively, spending may increase if
patients’ responses to treatments are less favorable than usual or there is a local epidemic
of infectious disease.
In response to random variation, shared savings arrangements often include a threshold
level of savings (e.g., 2 %) that must be observed before the observed savings are counted
as ‘‘real.’’ A key issue is where the threshold should be set. If it is set too low, then there is
a substantial risk that providers will receive rewards when no real savings are achieved.
But if the threshold is set too high, then there is a substantial risk that providers will fail to
be rewarded when real savings are achieved.
Clearly, lower thresholds are advantageous to providers, while higher thresholds are
advantageous to payers. While the exact threshold level might be negotiated by providers
and payers, it is important for both sides to have a clear understanding of how the relative
risks change in response to different threshold levels. To facilitate this understanding, it is
useful to frame these risks in terms of probabilities of Type I and Type II error in statistical
hypothesis testing. In this framework, the null hypothesis is that the delivery reform under
investigation has no effect on per capita healthcare spending.
A recent analysis of minimum savings thresholds in the MSSP finds that the proba-
bilities of Type II errors are often much larger than the corresponding probabilities of Type
I error errors, making the statistical risks faced by providers larger than those faced by the
Medicare program (DeLia et al. 2012). This prior analysis also suggests that the extent of
random variation in per capita healthcare spending may have been underestimated when
setting the minimum savings thresholds for the MSSP. Failure to adequately recognize and
respond to random variation in shared savings arrangements is an important problem that
can lead to misallocation of resources away from efficient healthcare providers and toward
inefficient ones, a process that would directly undermine the overarching goals of
healthcare payment and delivery reform.
The current paper extends the prior work by DeLia et al. (2012) in two fundamental ways.
First, it considers the role of random variation in per capita healthcare spending from a much
wider range of sources and ties this variation to specific design elements of shared savings
arrangements. Second, it develops a general modeling framework to encompass shared savings
arrangements that are currently evolving in Medicaid and commercial insurance in addition to
Medicare ACO programs. The analysis is informed by current and evolving practice in public
and private sector shared savings arrangements. Based on these arrangements, we construct
statistical models that are needed to separate true provider performance from random variation
and state explicitly the key assumptions that are needed to implement these models.
Data that could measure the performance of the shared savings arrangements described
above are not publicly available as many of these arrangements are in the initial imple-
mentation stages, while others are still in planning stages. Thus, the current analysis uses
analytic formulas and numerical simulations to derive the key findings of the paper. Based
on these findings, we discuss the data collection strategies that are needed to implement
and improve the statistical efficiency of each measured savings approach. We also discuss
the analytic and practical tradeoffs associated with these approaches and conclude with
points of departure for extended analyses.
2 Emerging approaches to savings measurement
Across the public and private sectors, the measurement of savings associated with delivery
reforms involves an assessment of the change in per capita spending within a defined
Health Serv Outcomes Res Method (2013) 13:219–240 221
123
patient population before and after the relevant delivery reform is implemented. To
determine whether the observed change is attributable to the relevant reform, savings
measurement schemes typically include a comparison population or a predetermined target
to which the provider group is held. As mentioned above, shared savings formulas also
often include a minimum savings threshold that must be crossed to ensure that observed
savings are not the result of random variation.
The MSSP, which allows Medicare to enter into shared savings agreements with ACOs,
has established a very detailed methodology for measuring savings (Medicare Program
2011). Specifically, the Center for Medicare and Medicaid Services (CMS) determines the
benchmark level of per capita spending within the ACO.1 This is done by taking a
weighted average of the most recent 3 years of per capita spending among patients who
would be assigned to the ACO according to pre-existing healthcare utilization patterns.2
The benchmark value is then ‘‘updated’’ using the projected absolute amount of growth in
national per capita expenditures for Parts A and B services under the original Medicare
FFS program. Average per capita expenditures within the ACO in the performance year are
then compared to the updated benchmark. If performance year expenditures are less than
this benchmark by a predetermined amount, then the ACO would be eligible for a financial
reward. The predetermined amount is based on a methodology to account for random
variation, which is described below. Similarly, in the two-sided model, if performance year
expenditures are sufficiently greater than the benchmark, then the ACO would pay a
penalty.
The Medicare approach is often used as a template for many emerging Medicaid and
commercial ACOs that are currently under development (DeLia and Cantor 2012; Owen
2012; Probert 2012; Weissman et al. 2012; Bailit et al. 2012; Bailit and Hughes 2011).
These additional ACO programs, however, often deviate from the Medicare approach in a
variety of important details (DeLia and Cantor 2012; Owen 2012; Probert 2012; Weissman
et al. 2012; Bailit et al. 2012; Bailit and Hughes 2011). These details include the use of
fixed versus projected spending targets, development of benchmark populations, use or
non-use of risk adjustment, limits on the minimum number of patients assigned to provider
groups, and topcoding spending for high-cost outlier patients.3
In the next section, we use the most common elements of savings measurement
approaches to develop a general statistical model that is sufficiently flexible to accom-
modate many of the variations in real world savings measurement schemes. We place
particular emphasis on the way in which the chosen scheme can create different sources of
random variation in savings measurement. With this model, we derive implications for the
likelihood of Type I and II errors when setting standards for the recognition of savings.
1 All spending amounts in the MSSP are risk adjusted using CMS’s Hierarchical Condition Categories,which were developed for risk adjustment of premiums paid to Medicare Advantage plans (Ash et al. 2000).2 In this weighting scheme, the most recent year is weighted at 0.6, the middle year at 0.3, and the earliestyear at 0.1. To adjust for medical cost inflation, the middle and least recent years are ‘‘trended forward’’using the national growth rate in Medicare Part A and B expenditures among fee-for-service beneficiariesnationally.3 Topcoding, sometimes called truncation, refers to the process where patients with spending above acertain threshold (e.g., $100,000) have their actual spending amount replaced with the threshold amount toreduce their influence on the calculated mean.
222 Health Serv Outcomes Res Method (2013) 13:219–240
123
3 Sources of random variation
To understand the role of random variation in establishing savings, we build on and extend
the analytic framework developed by DeLia et al. (2012) in their more narrowly focused
assessment of the MSSP. Using their notation, we define the following key variables:
(1) Average per capita baseline spending �YB
(2) Adjustment factor reflecting the target growth in per capita spending A
(3) Average per capita performance year spending �YP
Using this notation, DeLia et al. (2012) define the average savings rate (ASR) as
ASR ¼ �YB þ Að Þ � �YP½ �=ð �YB þ AÞ: ð1ÞSavings are apparent if ASR [ 0, reflecting the fact that performance year spending is
less than baseline spending plus the allowed growth factor. Similarly, losses are apparent if
ASR \ 0. In the Medicare and other shared savings programs, ASR must cross a prede-
termined threshold (often based on the number of patients assigned to the provider group)
for apparent savings (or losses) to be recognized as real.
The final thresholds used in the MSSP are derived from a fairly rigid set of statistical
assumptions spelled out systematically by DeLia et al. (2012). The most important of these
assumptions in the context of this paper are: (1) inferences about the ASR are conditional
on observed values for �YB and (2) the adjustment factor A is known with certainty. Under
these assumptions, the only component of the ASR that is subject to random variation is�YP.
More specifically, �YP can be modeled as �YP ¼PN
iðlPþePiÞN
where lP is the true under-
lying mean in per capita spending in the performance period, ePi is the deviation from this
mean for patient i in the performance period, and N is the number of patients assigned to
the provider group. Although the expected value of �YP is lP (i.e., EðePiÞ ¼ 0) the actual
value of �YP will be different from lP by an amount that is unobservable to the payer and
provider group before they enter into a shared savings agreement. Nevertheless, for a
sufficiently large N, one can apply the Central Limit Theorem to �YP (or equivalently to �eP,
which is the mean of the ePi values that will ultimately be realized) to determine the
likelihood that �YP and lP will differ by a given amount. This idea forms the basis for
setting the minimum savings rates (MSRs) that the MSSP requires ACOs to achieve before
they are given credit for savings (DeLia et al. 2012).
It is worth noting that the sample size needed to reliably use the Central Limit Theorem
depends in part on the level of skewness in the underlying distribution from which
observations are drawn. This is a particular concern for healthcare expenditure data, which
is widely known to be heavily skewed. Yu and Machlin (2004) examined this issue using
data from the Medical Expenditure Panel Survey (MEPS). They found that confidence
intervals based on the normal distribution provide reliable coverage probabilities for
sample sizes that are at least 4,000. Since the smallest sample size we consider in our
analysis below (and in the paper by DeLia et al. 2012) is 5,000, use of the Central Limit
Theorem is justified. Moreover, in shared savings programs that topcode outlier levels of
spending, the skewness problem is reduced even further.
In this paper, we relax the two assumptions described above and consider the effects of
random variation in all three key elements of the ASR ( �YB, A, and �YP). Before doing so, it
is important to note that the sources of this variation are tightly linked to the specific design
features of shared savings arrangements. For example, if payers and providers can observe
Health Serv Outcomes Res Method (2013) 13:219–240 223
123
baseline spending levels for the relevant group of patients, then the rules for measuring
savings on the basis of the ASR formula in Eq. 1 can be made conditional on observed
baseline spending. In other words, once observed, baseline spending �YBð Þ can be taken as
given and no longer subject to random variation.
In many instances, however, baseline spending is not likely to be given at the time when
stakeholders enter into shared savings arrangements for a variety of reasons. First, data lags
prevent the most timely spending data from being assembled and analyzed. Second, even if
providers have their own data in real time, they typically do not capture spending by their
patients in other healthcare settings. Third, depending on the design of the arrangement, the
composition of the patient group needed to calculate baseline spending may not be
observed until after the performance period has ended. This is the case under retrospective
patient assignment, which is mandatory in the MSSP and offered as an option for Medicare
Pioneer ACOs (Medicare Program 2011; Centers for Medicare & Medicaid Services
2011a). Specifically patients are assigned to ACOs based on the patients’ primary care use
patterns in the performance period. This group of patients and their corresponding
spending amounts clearly are not observable at baseline. In this case, random variation in�YB would need to be modeled with a framework similar to the one described for �YP (i.e.,
model �YB as �YB ¼PN
iðlBþeBiÞN
and apply the Central Limit Theorem).
Similarly, the adjustment factor (A) may or may not be a random variable depending on
the design of the shared savings arrangement. If A is based on contemporaneous spending
growth in a comparison population, then it will be a random variable. Conversely, if A is
clearly specified in advance, then it is deterministic. In practice, currently evolving shared
savings arrangements include a wide variety of random and deterministic adjustment
factors (Bailit and Hughes 2011; Bailit et al. 2012).
The effects of random variation are captured in the variance of the ASR. To calculate
this variance, we assume that that �YB is calculated from a random sample of N patients who
have a healthcare spending distribution with mean lB and variance r2B.4 Similarly, �YP is
based on the same group of N patients with spending mean lP and variance r2P. Since the
same patients appear in both periods, we use the parameter q to represent the correlation of
healthcare spending by the same patient over the two time periods and assume that q� 0,
which is commonly observed in practice. Finally, the random variable A has mean rep-
resented by lA and variance represented by r2A.
With this information, the variance of the ASR can be approximated using the Taylor
series linearization method (see Appendix) as given below:
V ASRð Þ ¼ ðr2P=NÞ
ðlB þ lAÞ2þ l2
Pðr2B=NÞ
ðlB þ lAÞ4þ l2
Pðr2A=NÞ
ðlB þ lAÞ4
þ 2�lP � covð�YP; �YBÞðlB þ lAÞ3
þ�lP � covð�YP;AÞðlB þ lAÞ3
þ l2P � covð�YB;AÞðlB þ lAÞ4
" # ð2Þ
4 In the MSSP, �YB is based on a 3-year weighted average, while in other arrangements, it may be based on anon-weighted average or a single year of data. Although these different approaches to baseline measurementcould be modeled within this analytic framework, they would add complexity without altering the mainconclusions of the paper.
224 Health Serv Outcomes Res Method (2013) 13:219–240
123
Notice that if the only random variable is �YP, then the variance and covariance terms
involving �YB and A will equal zero, and thus, V ASRð Þ ¼ ðr2P=NÞ
ðlBþlAÞ2, which is the result derived
by DeLia et al. (2012).
If A is set in such a way that it is independent of the activities of the provider group
potentially eligible for shared savings (as is likely the case in most arrangements), then the
covariance terms involving A are equal to zero. Moreover, as shown in the Appendix,
cov �YP;�YB
� �¼ qrPrB=N. Thus, Eq. 2 can be rewritten as:
V ASRð Þ ¼ ðr2P=NÞ
ðlB þ lAÞ2þ l2
Pðr2B=NÞ
ðlB þ lAÞ4þ l2
Pðr2A=NÞ
ðlB þ lAÞ4� 2
ðlPqrPrBÞ=N
ðlB þ lAÞ3
" #
ð3Þ
Here the effect of randomness in each of the key variables on V(ASR) becomes clear.
Random variation in A adds the terml2
Pðr2A=NÞ
ðlBþlAÞ4, which clearly increases V(ASR). Random
variation in �YB adds the terml2
Pðr2B=NÞ
ðlBþlAÞ4� 2
ðlPqrPrBÞ=N
ðlBþlAÞ3h i
: If q = 0, then this term clearly
increases VðASRÞ. However, if q is sufficiently large, then random variation in �YB could
decrease V(ASR).
Suppose, for example, that the provider group produces no savings and there is no
change in the variance of healthcare spending, which is in effect the null hypothesis that an
MSR threshold would be designed to confirm or reject. Then we can write lP ¼ lB þ lA
and r2P ¼ r2
B ¼ r2. Plugging this information into the term created by random variation in�YB shows that this term will be negative if and only if q[ 0.5 (see Appendix). This result
is consistent with a well established rule that pairing observations in an experimental
setting reduces variance if and only if the outcome correlation within subjects exceeds 0.5
(van Belle 2002).
4 Hypothesis testing with minimum savings thresholds
As shown above, random variability in the components of the ASR can add to its variance.
Greater variance increases the probability of Type I error (i.e., rewarding savings that are
apparent but not real) and Type II error (i.e., failing to reward true savings that are not
apparent). To illustrate these impacts quantitatively, we calculate the probabilities of both
types of error under three approaches that reflect the sources of random variation in the
ASR, which are directly influenced by the design and data availability for savings mea-
surement. The three approaches are as follows:
• Approach 1: Only per capita spending in the performance period ( �YP) is a random
variable, while baseline spending �YBð Þand the adjustment factor (A) are deterministic.
• Approach 2: Per capita spending at baseline and in the performance period are random
variables, while the adjustment factor is deterministic.
• Approach 3: Per capita spending at baseline and in the performance period as well as
the adjustment factor are all random variables.
To simplify the calculations, we consider only one-sided models.
In our framework, a provider group is eligible for shared savings if its ASR exceeds a
threshold k. For illustrative purposes, we use the thresholds set by the MSSP in our
calculations below (although different thresholds could easily be used in other
Health Serv Outcomes Res Method (2013) 13:219–240 225
123
arrangements). Specifically, we set k equal to 0.039, 0.025, or 0.022 for provider groups
with 5,000; 20,000; or 50,000 assigned patients, respectively. These thresholds were
developed so that the probability of Type I error would be 0.10, 0.05, or 0.01 for ACOs
with 5,000, 20,000, or 50,000 patients, respectively, under the assumption that per capita
spending in the performance period is the only random variable.
To assess the impact of random variation on the probability of Type I and II error, we
assume that lP ¼ ð1� sÞðlB þ lAÞ: In calculating the Type I error probability, we con-
sider the null hypothesis that the provider group generates no savings or losses, which is
equivalent to setting s = 0. The calculation of Type II error probability requires a specific
alternative hypothesis. For illustrative purposes, we consider as the alternative hypothesis
the case where the provider group generates a moderate savings of 3.5 %, which is
equivalent to setting s = 0.035.
In Approaches 1 and 2 (where the adjustment factor is deterministic), A = lA with
certainty. In Approach 3 (where the adjustment factor is a random variable), we specify a
probability distribution for A that is centered around lA as described below.
Following the scenario analysis of DeLia et al. (2012), we assume that
lB þ lA ¼ $11; 762, which was the average level of per capita Medicare spending in
2010. To derive a value for lA, we calculate the annual growth in per capita Medicare
spending from 2000 to 2010 using information from the Centers for Medicare and
Medicaid Services (2011b). We find that this growth varied from a minimum of $228
in 2000 to a maximum of $1,334 in 2006. Thus, we set lA = 781, which is the
midpoint between these two numbers. In our analysis for Approach 3, we specify a
probability distribution for A with mean lA = 781 in a series of simulations described
in Sect. 4.2.
Since V(ASR) depends on the within-patient spending correlation coefficient q, we
consider different values of q in each approach. Specifically, we examine q values equal to
0 (indicating independence between �YB and �YP) as well as 0.33 and 0.67, which straddle the
key q value of 0.5 described above.
4.1 Random baseline with deterministic adjustment factor (Approaches 1 and 2)
To compare the probabilities of Type I and II error under Approaches 1 and 2, we derive
the corresponding probability distributions for ASR. We then use these distributions to
calculate the probabilities that ASR exceeds the given threshold k. To facilitate calcula-
tions, we assume that the variance in healthcare spending is the same in the baseline and
performance periods—i.e., r2P ¼ r2
B ¼ r2:
To derive the probability distribution for ASR under Approach 1, we rewrite Eq. 1 as
follows:
ASR ¼ 1� �YP=ð �YB þ AÞ: ð4Þ
Under Approach 1, �YB and A are fixed at lB and lA, respectively, and therefore,
E ASRð Þ ¼ 1� lP
lBþlA¼ 1� ð1�sÞðlBþlAÞ
ðlBþlAÞ¼ s: As shown above, V ASRð Þ ¼ ðr2=NÞ
ðlBþlAÞ2under
Approach 1. With this information, we can apply the Central Limit Theorem to �YP, which
means for a sufficiently large number of assigned patients, ASR will be approximately
normally distributed. As a result, we can calculate the probability that ASR exceeds the
threshold k in Approach 1 as (see Appendix):
226 Health Serv Outcomes Res Method (2013) 13:219–240
123
P1 ¼ 1� Uk � sð ÞðlB þ lAÞ
r=ffiffiffiffiNp
� �
; ð5Þ
where U(�) is the cumulative distribution function for a standard normal variable. Under
Approach 2, we can apply the Central Limit Theorem to �YP and �YB, which makes ASR a
ratio of two normally distributed random variables. Using the approximation formulas
derived by Marsaglia (1965, 2006) for a ratio of normals (see Appendix), we calculate the
probability that ASR exceeds the threshold k in Approach 2 as follows:
P2 ¼ Us� kð ÞðlB þ lAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðr2=NÞðk2 þ 2ð1� kÞð1� qÞÞp
" #
; ð6Þ
when s = 0 (i.e., no real savings exist), then P1 and P2 measure the probabilities of Type I
error in Approaches 1 and 2, respectively. When s [ 0 (i.e., true savings do exist), then P1
and P2 measure the statistical power to detect savings in each approach, and therefore, the
probability of Type II error in Approach j is 1 - Pj.
To compare the statistical properties of the ASR under the two approaches, let x = s – k
and define f xð Þ ¼ P2 � P1: Then as shown in the Appendix,
f xð Þ ¼ U Jxð Þ � UðIxÞ; ð7Þ
where J ¼ ðlBþlAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðr2=NÞðk2þ2ð1�kÞð1�qÞÞp and I ¼ ðlBþlAÞ
r=ffiffiffiNp . The function f(x) can be used to compare
Approach 1 to Approach 2 in terms of Type I and II error probabilities for different
configurations of the true savings rate s, the threshold requirement k, and within-patient
spending correlation q. As shown below, much depends on whether s [ k, which would
make x [ 0 in Eq. 7 or whether the opposite is true. For example, if s = 0 and f(x) [ 0, then
the probability of Type I error is greater in Approach 2 compared to Approach 1. Alter-
natively, if s [ 0 and f(x) \ 0, then P2 \ P1, and therefore, 1 – P2 [ 1 – P1. In this case,
the probability of Type II error would be greater in Approach 2 compared to Approach 1.
A complete set of configurations with particular relevance for savings measurement
design is derived in the Appendix and shown in Table 1 (and derived in the Appendix).
Table 1 shows that no approach dominates the other in terms of statistical accuracy across
all configurations. Nevertheless, some configurations are either impossible or less likely
than others in real world shared savings arrangements.
First, consider probabilities of Type I error. When s = 0, the configuration s [ k is
impossible, since a savings threshold would never be negative. When s = k =0, then
according to Eqs. 5 and 6 the probability of Type I error is equal to 0.5 in both approaches.
If k [ s = 0, then the role of within-patient spending correlation q takes on particular
importance. If in this case, q = 0, then the last two rows of Table 1 would be irrelevant
since the condition k� 1 would never occur in practice, and thus, Approach 2 clearly
would be more prone to Type I error. The upper bound on k is even tighter than 1, since
MSR thresholds in practice rarely exceed 0.05 (Weissman et al. 2012). Thus, the last two
rows of Table 1 are irrelevant for Type I error if 1 - 2q [ 0.05, or q \ 0.475. Conversely,
the first two rows of Table 1 with respect to Type I error would be irrelevant if q [ 0.5,
since this would create the condition that k \ 0, which is also impossible in practice. Thus,
Approach 1 would produce greater Type I error in this case, which is consistent with the
statement above that Approach 1 has a larger value for V(ASR) when q[ 0.5. If
0:475�q� 0:5, then the relative probability of Type I error depends on where k is set
relative to q.
Health Serv Outcomes Res Method (2013) 13:219–240 227
123
In terms of Type II error probability, there is no difference between Approaches 1 and 2
whenever s = k or k =1 - 2q. As discussed in the Type I error analysis above, if
q\ 0.475, then the last two rows of Table 1 would be irrelevant. In this case, the relative
probability of Type II error depends on whether the MSR threshold k exceeds the
underlying savings rate s. Specifically, Approach 1 would be more favorable to provider
groups that generate true savings falling between 0 and the MSR threshold. Approach 2 is
more favorable to provider groups that generate true savings above the MSR threshold.
Conversely, if q [ 0.5, then, as also described above, the first row of Table 1 is not
applicable. As a result, the relationship between s and k in predicting the relative Type II
error is reversed relative to the case where q \ 0.475. Similar to the Type I error analysis,
there is no clear prediction for relative Type II error when 0:475� q� 0:5.
Table 2 provides specific examples of relative Type I and II error probabilities using
MSR thresholds from the MSSP. In the cases where q = 0 or q = 0.33, the probability of
Type I error is always much greater in Approach 2. As expected, a large within-patient
spending correlation (q = 0.67) reverses this pattern so that the probability of Type I error
is always greater in Approach 1.
A qualitatively similar pattern appears for probabilities of Type II error for provider
groups with 20,000 or 50,000 assigned patients (i.e., error probabilities are higher in
Approach 2 for q = 0 or q = 0.33 but higher in Approach 1 for q = 0.67). The reverse
happens for provider groups with 5,000 assigned patients. This is because at 5,000 patients
the MSR threshold k is set at 0.039, which is greater than the underlying true savings rate
(s = 0.035) that we are considering in this example. We note as well that Type II error
probabilities are often much greater than Type I error probabilities under both approaches
regardless of the value for q.
4.2 Impact of random baseline values with random adjustment factor (Approach 3)
Given the complexity of the ASR when all three key variables are random, we use a
simulation methodology to assess the probabilities of Type I and II errors in Approach 3.
Using the Central Limit Theorem, �YB and �YP are jointly normally distributed with mean
vector ðlB; lPÞ and variance–covariance matrix V ¼ r2
N
1 qq 1
� �
. Based on the recent
growth in per capita Medicare spending described above, we assume that A is uniformly
distributed over the interval from 288 to 1,334. We conduct additional simulations
assuming that A is uniformly distributed over the interval from 661 to 881 to illustrate the
effects when A has a smaller variance but the same mean.
To conduct the simulations, we take random samples from the joint distribution of �YB
and �YP using the ‘‘drawnorm’’ command and a random sample from the distribution of
Table 1 Relative probabilities of Type I and II errors under Approaches 1 and 2
Approach with higher Type Ierror probability
Approach with higher Type IIerror probability
s \ k s = k s [ k s \ k s = k s [ k
k \ 1 - 2q Approach 2 Same N/A Approach 1 Same Approach 2
k = 1 - 2q Same Same N/A Same Same Same
k [ 1 - 2q Approach 1 Same N/A Approach 2 Same Approach 1
228 Health Serv Outcomes Res Method (2013) 13:219–240
123
A using the ‘‘runiform’’ command in STATA 11.1. Following the methodology in Sect.
4.1, we conduct simulations assuming that providers have an assigned number of patients
equal to 5,000; 20,000; and 50,000.
Table 3 shows the results of the simulations. With small variance in A, the probabilities
of Type I error are consistently larger in Approach 3 relative to Approach 2 (which are
consistently larger relative to Approach 1), especially when q = 0.67. With large variance
in A, the probabilities of Type I error are larger still. For provider groups with 20,000 or
50,000 assigned patients, the probabilities of Type II error are also consistently larger in
Approach 3 relative to the corresponding probabilities in Approaches 1 and 2, with larger
differences occurring in the case of large variance in A. For provider groups with 5,000
assigned patients, the probability of Type II error is similar across all three approaches,
regardless of the variance in A. This result is largely influenced by the fact that the assumed
effect size (s = 0.035) is less than the applicable minimum savings threshold (k = 0.039).
It is also of note that the parameter q, which was so influential under Approach 2, has
essentially no influence on Type I or II error probabilities under Approach 3.
4.3 Risk adjustment
Shared savings arrangements vary extensively according to whether and how they incor-
porate direct risk adjustment methods into their measured savings calculations. For
example, a recent survey of key informants involved with 27 shared savings arrangements
found that one-half do not use any risk adjustment techniques at all (Bailit and Hughes
2011). According to that study, ‘‘Models electing not to use risk adjustments were more
likely to be medical home initiatives or believed that risk adjustment was not imperative
because the shared-savings model involved comparing the provider’s performance to its
own past performance, with an assumption that the patient population risk burden would
not vary much from year to year (Bailit and Hughes 2011).’’
Table 2 Probabilities of Type I and II errors with different sources of random variation in the averagesavings rate
Number ofpatients
MSRthreshold
Probability of Type I error Probability of Type II error
Approach 1 (%) Approach 2 (%) Approach 1 (%) Approach 2 (%)
q = 0
5,000 0.039 10 17.8 55.2 53.8
20,000 0.025 5 12.0 25.5 31.9
50,000 0.022 1 5.1 8.8 16.7
q = 0.33
5,000 0.039 10 12.9 55.2 54.6
20,000 0.025 5 7.5 25.5 28.3
50,000 0.022 1 2.3 8.8 11.9
q = 0.67
5,000 0.039 10 5.4 55.2 56.6
20,000 0.025 5 2.0 25.5 20.6
50,000 0.022 1 0.2 8.8 4.6
Health Serv Outcomes Res Method (2013) 13:219–240 229
123
Shared savings arrangements that do incorporate risk adjustment involve a widely
varying and complex series of calculations. For example, the MSSP uses a complex risk
adjustment methodology that involves different risk adjustment factors for Medicare
subpopulations (e.g. dual eligibles), which are multiplied by per capita spending amounts
within these subpopulations and then aggregated to produce a final risk-adjusted per capita
spending amount (Centers for Medicare & Medicaid Services 2013; Kauter 2013).5 Pub-
licly available reports of other shared savings arrangements that do use risk adjustment
include very little detail regarding methodology (Bailit and Hughes 2011; Bailit et al.
2012; Weissman et al. 2012; Burke 2013). For these reasons, we are not able to directly
assess the impact of risk adjustment on random variation in the ASR in a general analytic
model.
We can, however, provide some illustration of how our modeling is affected by the use
of risk-adjusted per capita spending amounts. We do this by using an approach based on
observed-to-expected outcome ratios, which is a well established and widely used risk
adjustment framework (Iezzoni 2013). In this framework, risk-adjusted per capita spending
in the performance period �Y�P is written as:
�Y�P ¼ ð �YP= �EPÞ � �ZP; ð8Þ
where �EP is the average of the expected spending amounts derived from a multiple
regression model that uses patient diagnoses and demographic characteristics as inde-
pendent variables and �ZP is average per capita spending for a standard population (e.g., all
Medicare patients or all patients covered by a private insurer). Risk-adjusted spending in
the baseline period �Y�B is defined analogously. Replacing �YP and �YB with �Y�P and �Y�B,
Table 3 Probabilities of Type I and II errors with small and large variance in the adjustment factor for theaverage savings rate
Number ofpatients
MSRthreshold
Probability of Type I error Probability of Type II error
Small variancein A (%)
Large variancein A (%)
Small variancein A (%)
Large variancein A (%)
q = 0
5,000 0.039 17.6 22.5 54.2 54.2
20,000 0.025 12.5 24.8 32.0 39.4
50,000 0.022 6.1 26.1 18.1 35.5
q = 0.33
5,000 0.039 17.0 22.7 54.2 53.9
20,000 0.025 12.4 24.8 32.7 39.6
50,000 0.022 6.2 26.2 18.2 35.9
q = 0.67
5,000 0.039 17.0 21.7 54.7 51.9
20,000 0.025 12.6 24.5 32.4 39.3
50,000 0.022 6.2 26.8 18.1 35.9
5 Calculation of these risk adjustment factors includes rules that are designed to minimize ‘‘upcoding’’ ofdiagnoses during the performance period.
230 Health Serv Outcomes Res Method (2013) 13:219–240
123
respectively, in Eq. 4 would give the risk adjusted version of ASR, which we refer to as
ASR*.
The added complexity makes closed form analysis of ASR* intractable. However, the
statistical properties of ASR* can be simplified and sources of its random variation min-
imized if several of its key components can be observed before the relevant parties enter
into a shared savings contract. Specifically, following the process outlined in Approach 1,
the baseline average expected per capita spending ( �EB) and per capita spending for the
standard population ( �ZB) could be calculated and known to all parties in advance. In
addition, the baseline standard population could be used as the performance year standard
population (i.e., set �ZP ¼ �ZB) making this latter quantity known in advance as well. Thus,
the only sources of random variation in ASR* would be �YP and �EP. In this case, a group of
payers and providers contemplating a shared savings arrangement could use available data
to estimate the statistical properties of �YP and �EP (e.g., means, variances) from prior
knowledge of �YB and �EB. This information would enable a series of scenario analyses
similar to what appears above to generate a common understanding of Type I and Type II
error risks embodied in a negotiated arrangement.
5 Discussion
5.1 Policy and practice implications
This paper provides a systematic analysis of the role of random variation in the mea-
surement of savings derived from improved healthcare coordination. It focuses specifically
on the three main components of the ASR, which is the basis for many shared savings
arrangements that are currently evolving in the public and private sectors (Medicare
Program 2011; Centers for Medicare & Medicaid Services 2011a; Weissman et al. 2012;
Bailit et al. 2012; Bailit and Hughes 2011; DeLia and Cantor 2012; Owen 2012; Probert
2012; Burke 2013; McGinnis and Small 2012).
This analysis has broad implications for the design of shared savings arrangements. In
addition, payers and providers could apply their own data to the models developed in this
paper to estimate relevant parameters and better anticipate the risks that would be incurred
under different shared savings arrangements that they are considering.
We find that random variation in per capita healthcare spending in the baseline and
performance periods and in the adjustment factor used to set a benchmark for spending
growth all contribute substantially to the probability that a provider group will be inap-
propriately rewarded or not rewarded for their performance in controlling healthcare
spending. In technical terms, we find that the probabilities of Type I and Type II errors
associated with common approaches to savings measurement can be quite high, often
exceeding 10 or 25 %, respectively. As a result, there is substantial risk that resources will
be misallocated under many current shared savings arrangements.
Intuitively, we find that the risks of both kinds of errors are reduced when the number of
patients assigned to the relevant provider group is increased. But as discussed previously in
the context of ACOs, the creation of shared savings agreements with very large provider
groups raises concerns about the creation of excess pricing power and the potential for anti-
trust violations (Richman and Schulman 2011). In addition, effective care coordination
may be much harder to achieve in very large groups of providers with no prior history of
collaborating on patient care. Very large groups are also more prone to free-riding by less
Health Serv Outcomes Res Method (2013) 13:219–240 231
123
productive providers on the achievements of more productive ones. Moreover, even large
provider groups must contend with the limited precision of small patient samples if they
wish to allocate financial rewards to their provider subgroups on the basis of relative
contribution to overall healthcare spending reductions. Statistical precision from very large
patient panels is even less tenable in the case of PCMHs.
The risks of resource misallocation due to random variation can be reduced by con-
sidering these risks in the design of the savings measurement approach and by making
specific data available before implementing a shared savings agreement. For example, we
found that the probability of Type I error is much smaller when the adjustment factor for
setting the benchmark for spending growth is deterministic instead of random. With only a
few special exceptions, we find that the probability of Type II error would also be smaller.
The adjustment factor can be made deterministic by simply specifying it in advance at the
beginning of the shared savings agreement. For example, the payer and provider group
might agree that providers receive credit for savings if per capita healthcare spending
grows by less than $300 in the performance year. Knowledge of the spending goal with
certainty could carry additional advantages as well. With a predetermined benchmark,
providers would have a clear target on which to focus their efforts and would be in a better
position to evaluate and calibrate their progress before the end of the performance year.
These advantages, however, would have to be viewed alongside the possibility that the
predetermined benchmark is set too high or too low relative to what other providers outside
the shared savings arrangement are able to achieve.
With a predetermined value for benchmark spending growth, we find that the proba-
bilities of Type I and II errors might be reduced even further if the correlation of healthcare
spending by the same patients over time is especially strong (i.e., more than 0.5) and
incorporated into the measured savings formula.
The use of within-patient spending correlation to improve the statistical precision of the
ASR carries the implicit assumption that the same patients appear in the baseline and
performance periods, an assumption that we have made in the modeling above. In some
real-world shared savings arrangements, this assumption is clearly satisfied as savings are
measured only for groups of patients that meet this criterion (Weissman et al. 2012; Bailit
and Hughes 2011; Bailit et al. 2012). But in other arrangements, such as the MSSP, a
slightly different group of patients may appear in the baseline and performance periods
making it more difficult to exploit the information available from the within-patient
spending correlation.
In contrast, given a small within-patient spending correlation, savings can be measured
with greater statistical precision if performance year spending is the only source of random
variation. For this to be the case, the payer and provider group would need to know the
baseline values of per capita healthcare spending with certainty before entering into the
shared savings agreement. This might be done, for example, with claims data for a clearly
defined set of patients for whom savings will be measured.
The incorporation of these principles into real-world shared savings arrangements
depends crucially on methods used to assign patients to providers. Although many patient
assignment algorithms exist (Mehrotra et al. 2010; Pope 2011; Pantely 2011; Lewis et al.
2013), the key issue in this paper is whether the patient assignment method allows for the
measurement of key parameters in the ASR formula before the shared savings arrangement
is implemented. For example, the MSSP assigns patients to ACOs retrospectively at the
end of the performance period (Medicare Program 2011). Since the patient population is
not known until the end of the performance period, it is impossible to assess variation in
the ASR conditional on observed baseline spending at the beginning of the contract period.
232 Health Serv Outcomes Res Method (2013) 13:219–240
123
In contrast, providers in the Medicare Pioneer ACO Program can choose prospective
assignment where the patient base is known at the beginning of the contract period
(Centers for Medicare & Medicaid Services 2011a). Similarly, many commercial ACOs
and PCMHs use prospective assignment as well (Weissman et al. 2012; Bailit and Hughes
2011; Bailit et al. 2012).
Another variation, used by Medicaid ACO programs in New Jersey and Colorado,
assigns patients to providers geographically based on patient residence (New Jersey Public
Law 2011; McGinnis and Small 2012). Although the geographic boundaries in these
arrangements are well defined, population mobility (and mortality) creates a situation
where the baseline and performance year populations can be somewhat different. Never-
theless, geographic assignment clearly makes it possible to calculate baseline per capita
spending in a way that it is known to participants before the shared savings contract period.
Patient assignment also affects the extent to which there may be endogenous patient
switching between provider networks in ways that may be related to healthcare quality or
spending per patient. For example, with retrospective assignment, patients who receive
poor service from a given provider network, may shift the majority of their care to other
providers and thus would no longer be assigned to the original network. With prospective
or geographic assignment, providers would still be held responsible for patients who were
originally assigned to them (and who remain in the relevant service area), even if these
patients decided to receive care elsewhere.
One of the driving factors behind the probabilities of Type I and II errors in our analysis
is the minimum savings rate (MSR), which is the threshold that a provider group’s mea-
sured ASR must cross before the group is credited for producing savings. We observe in
particular that the relationship between statistical precision and the sources of random
variation in the ASR depends on whether the MSR threshold is above or below the true
ASR. This observation raises the more general question about how the MSR threshold
should be set in the first place. Our numerical analysis is based on MSR thresholds set by
the Medicare Shared Savings Program (MSSP), which implicitly assumes that the only
source of random variation in the ASR derives from performance year per capita healthcare
spending (Medicare Program 2011). These thresholds were intended to achieve specified
levels of Type I error with no direct consideration of Type II error (Medicare Program
2011).
The analyses in this paper suggest that more explicit attention to the tradeoff between
these two errors in the setting of MSR thresholds is warranted. At issue is the relative cost
of rewarding a provider group that did not achieve any savings versus denying a reward to
a provider group that did achieve savings. Part of these costs might involve what provider
groups are expected to do with the rewards that they receive. For example, in the NJ
Medicaid ACO demonstration, ACOs are required to reinvest any rewards they earn into
future care and access improvements (New Jersey Public Law 2011). Therefore, payments
made to providers as a result of Type I error might still produce some population health
benefits (albeit at higher costs) even if awarded erroneously. In other shared savings
arrangements, financial rewards might be used to enhance provider income. While this
second use of financial rewards is appropriate in certain contexts, it creates different risks
in terms of resource misallocation in the presence of Type I error.
Similarly, provider groups with very little experience in risk-based contracting, par-
ticularly relatively smaller ones, may be ill-equipped to deal with even modest probabil-
ities of Type II error in their early stages of shared savings activities. Thus, some payers
may find it useful in the early stages of shared savings arrangements to minimize the
probability of Type II error (exposing themselves to greater risk of Type I error) until their
Health Serv Outcomes Res Method (2013) 13:219–240 233
123
contracting providers gain more experience with the management of risks associated with
shared savings. This a philosophy that some payers have adopted as part of a longer term
strategy to get more providers interested in incentive-based payment models (DeLia and
Cantor 2012; Weissman et al. 2012).
5.2 Analytic extensions
Our analysis of measured savings methodology can be extended to incorporate specific
features that are unique to the many different arrangements that are evolving. For example,
in many of these arrangements, high-cost outliers are topcoded at a threshold level such as
the 99th percentile or a specific dollar figure ranging from $50,000 to $500,000 (Bailit et al.
2012; Medicare Program 2011; Weissman et al. 2012). (If spending is topcoded, for
example, at $100,000, then all patients with spending above this amount would have their
spending recorded as $100,000.) Analytically, topcoding has the effect of reducing both the
mean and the variance of per capita healthcare spending, although the effect tends to be
greater on the variance (Thomas et al. 2004; Iezzoni 2013). While this reduced variance
would lead to the desirable property of lower variance in the ASR, and therefore, lower
probabilities of Type I and II errors in the identification of savings, censoring high-cost
outliers has additional implications. Specifically, provider groups that focus on so-called
‘‘super-user’’ populations with extremely high utilization levels might not receive credit for
successful interventions if censoring is in place (Gawande 2011; Linkins et al. 2008; Okin
et al. 2000; Sadowski et al. 2009).
Although our analysis compares a single-year performance period with a single-year
baseline, other comparison schemes are sometimes used in shared savings arrangements.
Most prominently, the MSSP uses a three-year weighted average for baseline healthcare
spending with greater weight given to more recent years (Medicare Program 2011). Other
arrangements use a baseline period from 6 months to 2 years and a performance period
from 6 to 12 months (Weissman et al. 2012; Bailit et al. 2012). In these cases, the general
principles derived in this paper would remain the same but the calculation of within-patient
spending correlation and other relevant parameter estimates would have to be adjusted to
account for the specific lengths of baseline and performance periods.
We note as well that random variation could affect the statistical precision of healthcare
quality measures, which also form an important part of shared savings arrangements.
Achieving a desired level of statistical precision involves additional challenges when
specific measures apply only to a subset of the assigned patient population (e.g., percentage
of diabetics with properly controlled hemoglobin A1c). The nuances of statistical precision
in healthcare quality measures, though important, are beyond the scope of this paper.
Another important extension concerns the amount of over or underpayment that occurs
as a result of Type I or Type II error, respectively. Conditional on the occurrence of such
an error, the amount of over or underpayment depends on the number of patients covered
under the shared savings agreement and the underlying savings rates. It also dependents on
the idiosyncrasies of the specific arrangement that is implemented such as the shares of
measured savings that are distributed to payers versus providers and the use of fist-dollar
provider payments versus payments above a fixed savings threshold. Since Type I and II
errors are necessary conditions for these kinds of over and underpayments to occur, the
main conclusion of the paper still holds—namely, that these types of misallocations can be
minimized through careful design of the savings measurement methodology as described
above.
234 Health Serv Outcomes Res Method (2013) 13:219–240
123
In practice, payers and providers may use other contractual arrangements to hedge
against the risks from random variation. These include reinsurance purchased by providers
or, as recently suggested, options pricing by payers (Friedberg 2013). The analysis above
provides a framework for pricing the relevant risks that would determine the prices and
other details of these hedging activities. It is important to note, however, that the use of
these risk hedging mechanisms would add to the administrative costs of executing shared
savings arrangements. This problem could be especially acute for smaller provider groups
for whom administrative costs and premiums connected with reinsurance may be higher
relative to larger provider groups. Thus, it would remain important to minimize the
underlying risks as discussed throughout this paper.
Similar risk hedging issues arise for providers engaged in multiple shared savings
arrangements with different payers. In theory, random underpayment from one arrange-
ment might coincide with random overpayment from another, leaving the provider group
approximately held harmless. In practice, however, one cannot guarantee that the relative
risks would be inversely correlated across shared savings arrangements. To the contrary, in
a small to medium sized provider group, consistent but moderate success in reducing
spending could go undetected across multiple arrangements if all such arrangements are
statistically noisy (e.g., as in Approach 3 described above).6
5.3 Caveats and limitations
Our analysis should be considered in light of some caveats and limitations. First, we
considered only one-sided savings models in our calculations. While this focus does not
affect our calculations of ASR variance under different measured savings approaches, the
probabilities of misclassified provider performance would clearly be different in two-sided
arrangements.
At various points of the analysis, we rely heavily on the Central Limit Theorem to justify
our use of the normal distribution in our calculations. Because healthcare spending data is
highly skewed, use of the Central Limit Theorem requires larger samples than would be the
case for samples from more symmetrical distributions. In our calculations, we use sample
sizes of 5,000 or higher, which have been shown to be adequate for skewed data (Yu and
Machlin 2004). Nevertheless, some shared savings arrangements in practice involve ACOs
and PCMHs with smaller numbers of patients. In these instances, the risks of Type I and II
errors are likely higher than those reported here and must be assessed directly without the use
of large sample approximations. Also, we relied on approximation formulas for ratios of
normal variables, which can be sensitive to skewness, and additionally, correlation between
the relevant numerators and denominators. Although we do not expect these considerations to
add any systematic bias to our comparisons of different measured savings approaches, they
remain as issues in need of further analysis in future work.
Though not applied universally, some shared savings arrangements use risk-adjusted
measures of per capita healthcare spending (Weissman et al. 2012; Bailit and Hughes 2011;
Bailit et al. 2012). The great variety and complexity of risk adjustment methods that are used in
practice make the inclusion of risk adjustment in our models intractable. Although we outlined
some of the major considerations regarding risk adjustment and random variation in the ASR,
the role risk adjustment in this regard remains an important area for future investigation.
6 In theory, statistical noise in this situation might be substantially reduced if all payers contracting with agiven provider group pooled their data to measure savings. Many practical considerations (e.g., coordinationcosts, competition), however, would like prevent this from happening in practice.
Health Serv Outcomes Res Method (2013) 13:219–240 235
123
Our numerical calculations are based on MSR thresholds used by the MSSP and on
parameter assumptions used in a recent analysis of that particular program (DeLia et al.
2012). A different combination of MSR thresholds and parameter values would have
produced different probabilities of Type I and II errors. In particular, these probabilities
would have been smaller in populations with smaller healthcare spending variance.
Nevertheless, the general relationships among the ASR measurement approaches derived
in this paper are likely to hold across a wide variety of MSR thresholds and parameter
values.
In our analysis of random variation in the adjustment factor for baseline spending, we
used a simulation methodology that required specific assumptions about the probability
distribution for annual growth in per capita healthcare spending. It would be useful in
future research to examine the robustness of Type I and II error probabilities to different
assumptions about this probability distribution.
6 Conclusion
This paper shows how random variation can lead to large probabilities of Type I and II
errors in the recognition of savings that may or may not be generated by new forms of
patient care management. It also suggests ways in which savings measurement rules can be
designed to limit this random variation. Although such rules must be designed to meet
other implementation goals, our analysis provides insights and general formulas that can be
used to guide the design of measured savings approaches across a wide range of shared
savings arrangements that are currently evolving in the American health sector, which has
just entered a period of experimentation and refinement in terms of measuring and
rewarding savings that are generated from improved care coordination.
Acknowledgments This research was supported by the Agency for Healthcare Research & Quality (Grantno. R24-HS019678). Helpful comments were provided by Sujoy Chakravarty and two anonymousreviewers.
Appendix
This appendix provides proofs of mathematical statements that are used in the main text.
Statement A1
V ASRð Þ ¼ ðr2P=NÞ
ðlB þ lAÞ2þ l2
Pðr2B=NÞ
ðlB þ lAÞ4þ l2
Pðr2A=NÞ
ðlB þ lAÞ4
þ 2�lP � covð �YP; �YBÞðlB þ lAÞ3
þ�lP � covð�YP;AÞðlB þ lAÞ3
þ l2P � covð�YB;AÞðlB þ lAÞ4
" #
Proof To apply the needed formulas more easily, let x1 ¼ �YP, x2 ¼ �YB, and x3 ¼ A and
let li and r2i be the mean and variance for xi(i = 1, 2, 3). Then we can write the ASR as
ASR ¼ f ðx1; x2; x3Þ ¼ x2þx3ð Þ�x1
ðx2þx3Þ and apply the multivariate Taylor series method (Casella
and Berger 2002) to approximate the variance of f ðx1; x2; x3Þ as
236 Health Serv Outcomes Res Method (2013) 13:219–240
123
V f x1; x2; x3ð Þð Þ ¼X3
i¼1
X3
j¼1
fiðl1; l2; l3Þ � fjðl1; l2; l3Þ � covðxi; xjÞ
where fiðl1; l2; l3Þ is the first partial derivative of f with respect to xi evaluated at the point
ðl1; l2; l3Þ. Taking derivatives and rearranging terms gives:
V f x1; x2; x3ð Þð Þ ¼ r21
l2 þ l3ð Þ2þ l2
1r22
l2 þ l3ð Þ4þ l2
1r23
l2 þ l3ð Þ4
þ 2�l1covðx1; x2Þ
l2 þ l3ð Þ3þ�l1covðx1; x3Þ
l2 þ l3ð Þ3þ�l2
1covðx2; x3Þl2 þ l3ð Þ4
" # . Rewriting
the xi, li, and r2i terms in their original form gives the result.
Statement A2
cov �YP;�YB
� �¼ qrPrB=N
Proof By definition, q ¼ covðYBj ;Y
Pj Þ
rBrPwhere YB
j is the level of health spending for person j in
the baseline period and YPj is the level of health spending for person j in the performance
period. Also by definition, cov �YP;�YB
� �¼ E �YB � lBð Þð �YP � lPÞ½ �. But this can be rewritten
as E 1N
� �RiY
Bi � lB
� �1N
� �RiY
Pi � lP
� �� �¼ E 1
N
� �Riy
Bi
1N
� �Riy
Pi
� �where yB
i is the deviation of
YBi from its mean and yP
i is the deviation of YPi from its mean. Therefore, we can write
cov �YP;�YB
� �¼ 1
N2
� �E yB
1 þ . . .þ yBN
� �ðyP
1 þ . . .þ yPNÞ
� �¼ 1
N2
� �Ncov YB
j ; YPj
¼ qrPrB=N:
The second to last step derives from the assumptions that cov YBj ; Y
Pk
¼ 0; 8k 6¼ j and
that cov YBj ; Y
Pj
is the same for every patient j.
Statement A3 Suppose r2P ¼ r2
B ¼ r2. Then under the null hypothesis that
lP ¼ lB þ lA,l2
Pðr2B=NÞ
ðlBþlAÞ4� 2
ðlPqrPrBÞ=N
ðlBþlAÞ3h i
\0 if and only if q[ 0.5.
Proof Using the assumption that r2P ¼ r2
B ¼ r2, the term of interest can be rewritten aslPr2
NðlBþlAÞ3lP
lBþlA� 2q
h i. This term will be negative if and only if
lP
lBþlA� 2q
h i\0. Under the
null hypothesis, this means that (1 - 2q) \ 0, which occurs if and only if q[ 0.5.
Statement A4
P1 ¼ 1� Uk � sð ÞðlB þ lAÞ
r=ffiffiffiffiNp
� �
Proof By definition P1 ¼ PðASR [ kÞ under Approach 1. Putting ASR into standardized
form, we have
P ASR [ kð Þ ¼ PASR� E ASRð Þ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV ASRð Þ
p [k � E ASRð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV ASRð Þ
p
!
¼ 1� Uk � E ASRð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV ASRð Þ
p
!
¼ 1� Uk � sð ÞðlB þ lAÞ
r=ffiffiffiffiNp
� � .
Health Serv Outcomes Res Method (2013) 13:219–240 237
123
Statement A5
P2 ¼ Us� kð ÞðlB þ lAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðr2=NÞðk2 þ 2ð1� kÞð1� qÞÞp
" #
Proof As described in the main text, under Approach 2, ASR is the ratio of two
normally distributed random variables Z ¼ �YB þ Að Þ � �YP and W ¼ ð�YB þ AÞ. Although
the distribution of Z/W is fairly complex, it can be greatly simplified if we assume that
P(W [ 0) = 1 (Marsaglia 1965, 2006). This assumption is clearly innocuous in the
shared savings context, as per capita health spending is always greater than zero. To
determine the likelihood that a provider group would be rewarded for savings, we must
calculate the probability that ZW
[ k for a given MSR threshold k. If we define a new
random variable Q = kW - Z, we can apply the formula developed by Marsaglia
(1965, 2006) to calculate the required probability as
P2 ¼ P ASR [ kð Þ ¼ P Q\0ð Þ ¼ Uð�lQ=rQÞ, where lQ and rQ are the mean and
standard deviation of Q. Through direct substitution and calculation, it can be shown
that �lQ ¼ s� kð ÞðlB þ lAÞ and rQ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðr2=NÞ½k2 þ 2ð1� kÞð1� qÞ�
p, which com-
pletes the proof.
Statement A6 Let x = s - k and define f(x) = P2 – P1. Then f xð Þ ¼ U Jxð Þ � UðIxÞ,where J ¼ ðlBþlAÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðr2=NÞðk2þ2ð1�kÞð1�qÞÞp and I ¼ ðlBþlAÞ
r=ffiffiffiNp .
Proof Using the definitions of J and I and Statements A4 and A5, we have
f xð Þ ¼ U Jxð Þ � 1� Uð�IxÞ½ �. Using the symmetry of the standard normal distribution,
U �Ixð Þ ¼ 1� UðIxÞ. Therefore, f xð Þ ¼ U Jxð Þ � UðIxÞ as required.
Statement A7 The relationships listed in Table 1 in the main text follow directly from
the properties of f(x).
Proof First observe that f 0ð Þ ¼ U 0ð Þ � U 0ð Þ ¼ 0:5� 0:5 ¼ 0: Now suppose that x [ 0.
If k \ 1 -2q, then it can be shown with straightforward algebra that I [ J, and therefore,
f(x) \ 0 [since U(�) is monotonically increasing]. Similarly, if k [ 1 -2q, then f(x) [ 0.
Now suppose to the contrary that x \ 0. If k \ 1 - 2q, then I [ J ) Ix\Jx. Then
f(x) [ 0. Similarly, if k [ 1 -2q, then by similar reasoning f(x) \ 0. These relationships
are summarized in the Table 4.
The contents of Table 1 in the main text can be derived from Table 4 by using the
formulas x = s - k and f(x) = P2 – P1.
Table 4 Values of f(x) under alternative assumptions
x \ 0 x = 0 x [ 0
k \ 1 - 2q f(x) [ 0 f(x) = 0 f(x) \ 0
k = 1 - 2q f(x) = 0 f(x) = 0 f(x) = 0
k [ 1 - 2q f(x) \ 0 f(x) = 0 f(x) [ 0
238 Health Serv Outcomes Res Method (2013) 13:219–240
123
References
Ash, A.S., Ellis, R.P., Pope, G.C., Ayanian, J.Z., Bates, D.W., Burstin, H., Lezzoni, L.I., MacKay, E., Yu,W.: Using diagnoses to describe populations and predict costs. Health Care Financ. Rev. 21(3), 7–28(2000)
Bailit, M., Hughes, C.: Key design elements of shared-savings payment arrangements. CommonwealthFund, New York (2011)
Bailit, M., Hughes, C., Burns, M., Freedman, D.H.: Shared-savings payment arrangements in health care: sixcase studies. Commonwealth Fund, New York (2012)
Burke, G.: Moving toward accountable care in New York. United Hospital Fund, New York (2013)Casella, G., Berger, R.L.: Statistical inference, 2nd edn. Duxbury, Pacific Grove (2002)Centers for Medicare & Medicaid Services: Medicare shared savings program: shared savings and losses
and assignment methodology specifications, Version 2 April. CMS, Baltimore (2013)Centers for Medicare & Medicaid Services: Pioneer Accountable Care Organization (ACO) model request
for application. CMS, Baltimore (2011)Centers for Medicare & Medicaid Services: Health expenditures by state of residence. Centers for Medicare
& Medicaid Services. http://www.cms.gov/NationalHealthExpendData/downloads/resident-state-estimates.zip (2011b). Accessed 20 Dec 2012
DeLia, D., Cantor, J.C.: Recommended approach for calculating savings in the NJ Medicaid ACO dem-onstration project. Rutgers Center for State Health Policy, New Brunswick (2012)
DeLia, D., Hoover, D., Cantor, J.C.: Statistical uncertainty in the Medicare Shared Savings Program.Medicare Medicaid Res. Rev. 2(2), E1–E15 (2012)
Friedberg, M.: Option pricing: a flexible tool to disseminate shared savings contracts. Presented at theAcademy Health annual research meeting (2013)
Gawande, A.: The hot spotters: can we lower medical costs by giving the neediest patients better care? TheNew Yorker. http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_gawande?currentPage=all (2011). Accessed 31 July 2013
Iezzoni, L. (ed.): Risk adjustment for measuring health care outcomes, 4th edn. Health Administration Press,Chicago (2013)
Kauter, J.: Risk adjustment in the medicare ACO shared savings program. Presented at the Academy Healthannual research meeting (2013)
Lewis, V.A., McClurg, A.B., Smith, J., Fisher, E.S., Bynum, J.P.W.: Attributing patients to accountable careorganizations: performance year approach aligns stakeholders’ interests. Health Aff. (Millwood) 32(3),587–594 (2013)
Linkins, K.W., Brya, J.J., Chandler, D.W.: Frequent users of health services initiative: final evaluationreport. Lewin Group, Falls Church (2008)
Marsaglia, G.: Ratios of normal variables and ratios of sums of uniform variables. J. Am. Stat. Assoc.60(309), 193–204 (1965)
Marsaglia, G.: Ratios of normal variables. J. Stat. Softw. 16(4), 1–10 (2006)McGinnis, T., Small, D.M.: Accountable care organizations in medicaid: emerging practices to guide
program design. Center for Health Care Strategies Inc., Hamilton (2012)Medicare Program; Medicare Shared Savings Program: Accountable Care Organizations. 76 Fed. Reg.
67802 (to be codified at 42 C.F.R. pt. 425) (2011)Mehrotra, A., Adams, J.L., Thomas, J.W., McGlynn, E.A.: The impact of different attribution rules on
individual physician cost profiles. Ann. Intern. Med. 152(10), 649–654 (2010)New Jersey Public Law: Medicaid accountable care demonstration project, Chapter 114 (2011)Okin, R.L., Boccellari, A., Azocar, F., Shumway, M., O’Brien, K., Gelb, A., Kohn, M., Harding, P.,
Wachsmuth, C.: The effects of clinical case management on hospital service use among ED frequentusers. Am. J. Emerg. Med. 18(5), 603–608 (2000)
Owen, R.: Health care delivery systems demonstration project. Presented at the seventh national medicaidcongress (2012)
Pantely, S.E.: Whose patient is it? Patient attribution in ACOs. Milliman healthcare reform briefing paper (2011)Pope, G.: Attributing patients to physicians for pay for performance. Chapter 7. In: Cromwell, J., Triolini,
M.G., Pope, G., Mitchell, J.B., Greenwald, L.M. (eds.) Pay for performance in health care: methodsand approaches. RTI Press Publication No. BK-002-1103. RTI Press, Research Triangle Park (2011)
Probert, M.: Maine care: accountable communities initiative. Presentation at the seventh national medicaidcongress (2012)
Richman, B.D., Schulman, K.A.: A cautious path forward on accountable care organizations. JAMA 305(6),602–603 (2011)
Health Serv Outcomes Res Method (2013) 13:219–240 239
123
Robinson, J.C.: Accountable care organization for PPO patients: challenge and opportunity in California.Integrated Healthcare Association, Oakland (2011)
Sadowski, L.S., Kee, R.A., VanderWeele, T.J., Buchanan, D.: Effect of a housing and case managementprogram on emergency department visits and hospitalizations among chronically ill homeless adults: arandomized trial. JAMA 301(17), 1771–1778 (2009)
Thomas, W.J., Grazier, K.L., Ward, K.: Comparing accuracy of risk-adjustment methodologies used ineconomic profiling of physicians. Inquiry 41(2), 218–231 (2004)
van Belle, G.: Statistical rules of thumb. Wiley, New York (2002)Weissman, J.S., Bailit, M., D’Andrea, G., Rosenthal, M.B.: The design and application of shared savings:
lessons from early adopters. Health Aff. (Millwood) 31(9), 1959–1968 (2012)Yu, W.W., Machlin, S.: Examination of skewed health expenditure data from the medical expenditure panel
survey (MEPS). Working Paper No. 04002. Agency for Healthcare Research and Quality, Rockville(2004)
240 Health Serv Outcomes Res Method (2013) 13:219–240
123