4
Incorporating Patient Preferences and Utilities in the Assessment of Diagnostic Technology: Response 1 Respondent Anna N. A. Tosteson, ScD The most prominent role for utilities in medicine is in the estimation of quality-adjusted life years (QALYs). QALYs explicitly account for both mortality and morbidity (1). In the estimation of QALYs, each year of life is not given equal weight. Instead, each year is weighed according to its associated utility. Utilities are preference-based mea- sures of quality of life. They range from 0 to 1, with the best health state typically assigned a utility value of 1 and death a utility value of 0 (2-4). A comprehensive overview of utility definition and measurement is given by Nease (5). Before highlighting the type of utility assessments and the issues arising for both policy and individual patient de- cision making, it is instructive to look to the literature to see how utilities have been used in radiologic technology assessment. In particular, consider the evaluation of screening mammography for the detection of breast cancer. In assessing the effect of breast cancer screening programs on QALYs, de Haes et al (6) reviewed 176 studies that ad- dressed breast cancer screening and quality of life. Only 1% (n = 2) of the 176 studies addressed utility, in stark contrast with the many studies addressing the physical (75%), psychological (65%), and social (38%) effects of breast cancer on quality of life. The dearth of data on utili- ties in breast cancer and mammography screening is sur- prising, given the large number of cost-effectiveness Acad Radio11999; 6(suppl 1):$109-$112 1From the Clinical Research Section, Department of Medicine and Center for Evaluative Clinical Sciences, Department of Community and Family Medicine, Dartmouth Medical School, Hanover, NH. Supported in part by grant AG12262 from the National Institute on Aging of the National Insti- tutes of Health, U.S. Public Health Service. Address reprint requests to the author, HB 7505 Clinical Research, Dartmouth-Hitchcock Medical Cen- ter, 1 Medical Center Dr, Lebanon, NH 03756. ©AUR, 1999 evaluations that have been undertaken in this area (7). In addition to highlighting the absence of utility mea- surement for screening mammography, the article by de Haes et al (6) exemplifies how utilities can be used within a decision-analytic model to clarify the relative harms and benefits associated with a diagnostic technology. Using the MISCAN breast cancer model (8,9) with utilities (ie, each health state is assigned a utility) and conducting extensive analyses, investigators assessed the effect of a nationwide breast cancer screening program on QALYs. Particular emphasis was given to potential adverse consequences of breast cancer screening (eg, increased anxiety, false-posi- tive mammograms). On balance, it was found that quality of life had no substantial effect on overall benefit. While this example focused exclusively on screening's effect on the health of the population without regard to cost, most policy applications of utility assessment involve estimation of QALYs for economic evaluations. Indeed, QALYs are the recommended effectiveness metric for de- termining the economic value of interventions in health and medicine (1). To highlight the importance of QALYs in economic evaluation, consider a costly intervention that does not lengthen life but greatly reduces morbidity. If evaluated on the basis of years of life saved, this interven- tion would be considered an inappropriate use of re- sources. If evaluated on the basis of QALYs, it may pro- vide a benefit worth its cost. To facilitate comparisons across disease domains, utili- ties for use in cost-effectiveness evaluation must be mea- sured on a "policy scale." Thus, it is utility for current health, measured on a policy scale with perfect health as an upper anchor (1.0) and death as a lower anchor (0), that is often most relevant (10). There are several approaches to measuring utility for current health, including automated utility assessment in individual subjects using computer- based instruments, such as U-Titer (11,12), or preference $109

Incorporating patient preferences and utilities in the assessment of diagnostic technology: Response

  • Upload
    anna-na

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Incorporating patient preferences and utilities in the assessment of diagnostic technology: Response

Incorporating Patient Preferences and Utilities in the Assessment of

Diagnostic Technology: Response 1

R e s p o n d e n t

A n n a N. A. T o s t e s o n , ScD

The most prominent role for utilities in medicine is in the estimation of quality-adjusted life years (QALYs). QALYs

explicitly account for both mortality and morbidity (1). In the estimation of QALYs, each year of life is not given equal weight. Instead, each year is weighed according to its associated utility. Utilities are preference-based mea- sures of quality of life. They range from 0 to 1, with the best health state typically assigned a utility value of 1 and

death a utility value of 0 (2-4). A comprehensive overview of utility definition and measurement is given by Nease (5).

Before highlighting the type of utility assessments and

the issues arising for both policy and individual patient de- cision making, it is instructive to look to the literature to

see how utilities have been used in radiologic technology assessment. In particular, consider the evaluation of

screening mammography for the detection of breast cancer. In assessing the effect of breast cancer screening programs on QALYs, de Haes et al (6) reviewed 176 studies that ad- dressed breast cancer screening and quality of life. Only

1% (n = 2) of the 176 studies addressed utility, in stark contrast with the many studies addressing the physical (75%), psychological (65%), and social (38%) effects of breast cancer on quality of life. The dearth of data on utili- ties in breast cancer and mammography screening is sur- prising, given the large number of cost-effectiveness

Acad Radio11999; 6(suppl 1):$109-$112

1 From the Clinical Research Section, Department of Medicine and Center for Evaluative Clinical Sciences, Department of Community and Family Medicine, Dartmouth Medical School, Hanover, NH. Supported in part by grant AG12262 from the National Institute on Aging of the National Insti- tutes of Health, U.S. Public Health Service. Address reprint requests to the author, HB 7505 Clinical Research, Dartmouth-Hitchcock Medical Cen- ter, 1 Medical Center Dr, Lebanon, NH 03756.

©AUR, 1999

evaluations that have been undertaken in this area (7). In addition to highlighting the absence of utility mea-

surement for screening mammography, the article by de Haes et al (6) exemplifies how utilities can be used within a decision-analytic model to clarify the relative harms and benefits associated with a diagnostic technology. Using the MISCAN breast cancer model (8,9) with utilities (ie, each health state is assigned a utility) and conducting extensive

analyses, investigators assessed the effect of a nationwide breast cancer screening program on QALYs. Particular emphasis was given to potential adverse consequences of breast cancer screening (eg, increased anxiety, false-posi- tive mammograms). On balance, it was found that quality of life had no substantial effect on overall benefit.

While this example focused exclusively on screening's effect on the health of the population without regard to

cost, most policy applications of utility assessment involve estimation of QALYs for economic evaluations. Indeed, QALYs are the recommended effectiveness metric for de- termining the economic value of interventions in health

and medicine (1). To highlight the importance of QALYs in economic evaluation, consider a costly intervention that does not lengthen life but greatly reduces morbidity. If evaluated on the basis of years of life saved, this interven- tion would be considered an inappropriate use of re- sources. If evaluated on the basis of QALYs, it may pro-

vide a benefit worth its cost. To facilitate comparisons across disease domains, utili-

ties for use in cost-effectiveness evaluation must be mea- sured on a "policy scale." Thus, it is utility for current health, measured on a policy scale with perfect health as

an upper anchor (1.0) and death as a lower anchor (0), that is often most relevant (10). There are several approaches to measuring utility for current health, including automated utility assessment in individual subjects using computer- based instruments, such as U-Titer (11,12), or preference

$ 1 0 9

Page 2: Incorporating patient preferences and utilities in the assessment of diagnostic technology: Response

classification systems, such as the Health Utility Index (13,14) or EuroQoL (15,16). Preference classification sys -

terns are used to assign population-based values to health

states, which are defined on the basis of responses to a health status questionnaire (see Nease [5] for further de- tails on measurement techniques).

The choice of measurement approach (ie, direct assess- ment vs preference classification systems) is not trivial. Few studies have compared direct assessment of utility for

current health with preference classification system ap- proaches. In an ongoing study of the effect of osteoporosis on the health of elderly women with hip and/or vertebral fractures, we have found good correspondence between utility assessment with the time trade-off on life expect- ancy and the Health Utility Index preference classification

system (unpublished data). It remains to be seen, however, whether direct utility assessment and preference classifica- tion systems produce similar utilities in subjects with less physically debilitating health conditions.

One method for implementing utility assessment in the

clinical trials context is the Beaver Dam Health Study's catalog approach (17). The idea behind this approach is that by measuring utility for current health in large popula- tions of subjects, we could develop data resources appro-

priate for studying the basic descriptive epidemiology of utilities. As more sensitive diagnostic technologies are in- troduced, with QALYs as the outcome measure it may thus

be possible to identify and measure the effect of stage shifts in what is considered "disease" (18) on the health of

the population. Although longitudinal assessment of utilities in clinical

trials may provide important information on the effects of

diagnostic tests and treatments on QALYs, differences in

utility for current health between diagnostic test arms are likely to be small. Studies with the power to detect small differences will often be unfeasible because of the large number of subjects required. Fortunately, small differences in utility are also unlikely to be relevant in either the clini- cal or policy settings.

For trial planning in the diagnostic technology setting, we need to consider the "challenge" receiver operating

characteristic (ROC) concept, as described by Phelps and Mushlin (19). This concept is a generalization of the usual ROC curve, which is a graph of true-positive rate (sensitiv- ity) against false-positive rate (1 minus specificity) as the

threshold for classifying a test result as positive is varied (20). The challenge ROC is grounded in the decision-theo- retic concept of the expected value of clinical information.

In this setting, the expected value of clinical informa-

£,

13_

==

( - - O r i g i n a l ROC Curve

False Positive Rate Figure 1. Challenge region for a higher-cost test versus an es- tablished diagnostic technology (original ROC curve). (Reprinted, with permission, from reference 19.)

tion is the net expected change in QALYs obtained by us-

ing the new technology, relative to the QALYs obtained by using the best alternative in the absence of the new tech-

nology (21). When the prevalence of disease is low, the best alternative in the absence of testing is often no inter-

vention. When the prevalence is high, however, the best al- ternative in the absence of testing is often empirical treat- ment. By pairing the expected value of clinical information with the additional cost of a new technology, cost-effec-

tiveness hurdles are established for a new technology. For example, if the cost-effectiveness hurdle is set at $50,000 per QALY gained, a challenge ROC region would be iden- tified that highlights possible combinations of true- and false-positive rates that would result in cost-effectiveness ratios below this threshold (Fig 1).

When a new diagnostic technology is competing against

a highly accurate established technology, the challenge ROC region may become vanishingly small. In establish-

ing the potential of a new technology with the challenge ROC approach, the first hurdle corresponds to the assump- tion that the new technology can provide perfect informa-

tion on the underlying disease state. If one is able to rule out the use of the new technology based on cost-effective-

ness criteria even when it is assumed to be a perfect test, further investigation can be curtailed. Thus, the QALY- based challenge ROC can be used to highlight situations in

$110

Page 3: Incorporating patient preferences and utilities in the assessment of diagnostic technology: Response

(:3 ..J o - r o) ILl nt- "r F-

U - F A C T O R FOR P E R F E C T T E S T

1.0

0 . 8 -

0 . 6 -

0 . 4 -

0 . 2 -

0 o12 0 . 4 o:6 o18

Figure 2.

U-FACTOR

<.o, I - - I .o I - . 0 3

. 0 3 - . 0 5

. 0 5 - . 1 0

. 1 0 - . 1 5

~.o . 1 5 - . 2 _ 0

PRIOR P R O B A B I L I T Y > .P-O (a) (b)

(a) Nomogram of U-factor (UF) for a perfect test (sen- sitivity = 1, specificity = 1). (b) Key for U-Factor values. (Re- printed, with permission, from reference 21 .)

which diagnostic trials may be most valuable. An advan-

tage of the challenge ROC over the standard ROC ap- proach is that trade-offs between outcomes of testing (ie,

true-positive, false-negative, false-positive, true-negative)

and their effect on QALYs are explicitly addressed. Implementation of the challenge ROC generally re-

quires use of a decision-analytic modeling framework. In- deed, an important role for decision analysis in the plan- ning of trials for a new technology is to identify utilities and costs that have a large influence on QALYs and cost-

effectiveness. Even when utilities are measured in the con- text of randomized controlled trials, the limited period of

observation often requires that decision-analytic models be used to project the likely effect of interventions on longer- term outcomes. Thus, decision-analytic models, such as MISCAN (8,9), will be necessary for assessing the lifetime

effect of alternative health interventions. At the individual patient decision-making level, we

need to consider the assessment of utilities on a different scale. This scale must allow us to measure the disutility of tests and other interventions. Utilities for current health, which are measured on the policy scale of death to perfect

health, are unlikely to allow for discrimination between di-

agnostic tests on attributes that patients value. For ex- ample, patients are likely to prefer a test that causes no dis-

comfort to an equally sensitive and specific test that causes short-term discomfort. Assessment of utility for current

health in patients undergoing each test, however, is un-

likely to reveal a difference between the two tests. Assessments of how tests are valued by patients should

be undertaken directly in clinical trials of diagnostic tech- nologies. This is necessary, because the way in which pa- tients value trade-offs between test attributes is not always

clear. For example, it is uncertain how patients would value a highly accurate test that caused severe discomfort

for 10 minutes relative to a less accurate, painless test. One

challenge to assessing utilities for diagnostic tests and test attributes is their transient nature. Metaphors other than risk of death and trade-offs in life expectancy must be used

to assess utilities for transient health states. The disutility of short-term side effects in pharmaceutical trials has been effectively assessed as a time trade-off wherein sleep was

used as the metaphor for trading time within a 1-year time horizon (22). Because attributes of diagnostic tests are likely to be similar across disease domains (eg, discomfort, time to definitive diagnosis), development of a diagnostic test utility index should be considered. Such an instrument would parallel the approach taken by preference classifica-

tion systems but would focus on estimating the (dis)utility of a diagnostic test on a more limited scale.

For decision making by individual patients, a concept similar to the challenge ROC was introduced in a series of

three articles that described patient-oriented performance measures of diagnostic tests (23-25). In particular, the U- Factor performance measure (23) was described; this is based on the prior probability of disease and the decision

thresholds at which one would be indifferent between competing management alternatives (ie, do nothing, do an alternate test, treat empirically). The U-Factor, which is equivalent to the expected value of clinical information, is

the difference in overall utility with a diagnostic test rela- tive to the appropriate clinical management alternative.

Thus, it is explicitly based on utilities for test outcomes and accounts for the disutilities of testing. It may be esti- mated in the context of a decision-analytic model whose

primary end point is QALYs. Nomograms of the U-Factor

can be used to highlight the difference in expected QALYs between clinical management alternatives as a function of the prior probability of disease (Fig 2). Such nomograms may be of value for guiding individual patient decisions.

In summary, preference-based measures of the effect of diagnostic tests on quality of life are needed. Utility mea-

sures should be incorporated into trials to broaden the cata- log of health states for which utilities are available. Deci-

sion models should be used when planning trials to assess what matters from a QALY viewpoint. The QALY-based challenge ROC may be valuable in the planning of diag- nostic clinical trials. Utilities (or disutilities) associated

with performing diagnostic tests should also be assessed

directly. Future research should address the development of preference classification systems that are specific to the

diagnostic test setting.

S l l l

Page 4: Incorporating patient preferences and utilities in the assessment of diagnostic technology: Response

IEFERENCES

1. Sox H, Blatt M, Higgins M, Marton K. Medical decision making. Boston, Mass: Butterworths, 1988.

2. Weinstein M, Fineberg H, Elstein E, et al. Clinical decision analysis. Philadelphia, Pa: Saunders, 1980.

3. Read J, Quinn R, Berwick D, Fineberg H, Weinstein M. Preferences for health outcomes. Med Decis Making 1984; 4:315-329.

4. Gold M, Siegel J, Russell L, Weinstein M. Cost-effectiveness in health and medicine. New York, NY: Oxford University Press, 1996.

5. Nease RF Jr. Utility assessment and clinical trials of diagnostic inter- ventions. Acad Radio11999; 6(suppl 1):$103-$108.

6. DeHaes JCJM, DeKoning H J, VanOortmarssen G J, VanAgt HME, DeBruyn AE, VanDer Maas PJ. The impact of breast cancer screening programme on quality-adjusted life-years. Int J Cancer 1991 ; 49:538- 544.

7. Brown M, Fintor L. Cost-effectiveness of breast cancer screening: pre- liminary results of a systematic review of the literature. Breast Cancer Res Treat 1993; 25:113-118.

8. DeKoning H J, VanOortmarssen G J, Vanlneveld BM, VanDer Maas P J, Breast cancer screening: its impact on clinical medicine. Br J Cancer 1990; 61:292-297.

9. VanOorfmarssen G J, Habbema JDF, Vander Maas P J, et al. A model for breast cancer screening. Cancer 1990; 66:1601-1612.

10. Fryback D, Lawrence WJ. Dollars may not buy as many QALYs as we think: a problem with defining quality-of-life adjustments. Med Decis Making 1997; 17:276:284.

11. Sumner W, Nease R, Littenberg B, Kneeland T, O'Connor G. U-Titer: a utility assessment tool. Med Decis Making 1991 ;11:327.

12. Tosteson A, Kneeland T, Nease R, Sumner W, Minshall M, Draper M. Feasibility of automated utility assessment in phase III clinical trails: U- Titer and a Raloxifene osteoporosis prevention trial. Med Decis Making 1995; 15:413.

13. Torrance G, Zhang Y, Feeny D, Furlong W, Barr R. Multi-attribute pref- erence functions for a comprehensive health status classification sys- tem. CHEPA working paper series No. 92-18. 1992; 1-61.

14. Torrance G, Furlong W, Feeny D, Boyle M. Multi-attribute preference functions. PharmacoEconomics 1995; 7:503-520.

15. EuroQoL Working Group. EuroQol: a new facility for the measurement of health-related quality of life. Health Policy 1990; 16:199-208.

16. Dolan P. Modeling valuations for EuroQol health status. Med Care 1997; 35:1095-1108.

17. Fryback D, Dasbach E, Klein R, et al. The Beaver Dam health out- comes study: inital catalog of health-state quality factors. Med Decis Making 1993; 13:89-102.

18. Feinstein A, Sosin D, Wells C. The Will Rogers phenomenon: stage migration and new diagnostic techniques as a source of misleading statistics for survival in cancer. N Engl J Med 1985; 12:1604-1608.

19. Phelps CE, Mushlin AI. Focusing technology assessment using medi- cal decision theory. Med Decis Making 1988; 8:279-289.

20. Swets J, Pickett R. Evaluation of diagnostic systems: methods from signal detection theory. San Diego, Calif: Academic Press, 1982.

21. Tosteson A. Decision analysis in diagnosis and treatment choice. In: Armitage P, Colton T, eds. Encyclopedia of biostatistics. New York, NY: Wiley, 1998.

22. Tosteson A, Houpt L, Kneeland T, et al. Longitudinal analysis of utility data from a randomized controlled health economics trial. Med Decis Making 1997; 17:540.

23. Cain KC, Greenes RA, Begg CB. Patient-oriented performance mea- sures of diagnostic tests. Ill. U-factor. Med Decis Making 1984; 4:33- 46.

24. Greenes RA, Cain KC, Begg CB. Patient-oriented performance mea- sures of diagnostic tests. I. Tools for prospective evaluation of test or- der decisions. Med Decis Making 1984; 4:47-15.

25. Greenes RA, Begg CB, Cain KC, Swets JA, Feehrer CE, McNeil BJ. Patient-oriented performance measures of diagnostic tests. I1. Assign- ment potential and assignment strength. Med Decis Making 1984; 4:417-31.

S 1 1 2