2
EDITORIALS Evaluating Diagnostic Tests: What Is Done -- What Needs to be Done ACCURATE DIAGNOSIS is central to the practice of modern medicine. Establishing a correct diagnosis often depends on the availability of accurate di- agnostic tests. However, assessment of the accu- racy and clinical value of diagnostic tests is an imperfect and rapidly evolving science, fraught with many unresolved issues. As internists who use di- agnostic tests, teach their appropriate use to med- ical students and trainees, and are involved in the assessment of tests, it is essential that we better understand the principles and shortcomings of di- agnostic test evaluation. The primary purpose of diagnostic tests is to provide clinical information which can discrimi- nate among disease states, thereby improving the physician's management of the patient. Diagnostic tests can also be used to screen for disease in asymptomatic individuals, to monitor the course of disease, or to establish prognosis in patients with established diagnoses who are undergoing ther- apy. Thus, tests can be evaluated from the per- spective of their ability to accurately discriminate diseased from non-diseased individuals, their mar- ginal information relative to other tests and pro- cedures, their impact on subsequent management decisions, or their ultimate impact on the patient's health. However, all evaluations must determine whether, in specific clinical situations, the bene- ficial impact of new diagnostic information war- rants the health and resource costs required to obtain it. Fineberg 1 has proposed a hierarchical ap- proach to the assessment of diagnostic tests that involves determination of technical capacity (abil- ity to present precise, accurate, reproducible in- formation), diagnostic accuracy (ability to discriminate between patients with and without disease), and clinical value -- diagnostic impact (influence on the use of other diagnostic tests and procedures), therapeutic impact (influence on the selection and delivery of more appropriate ther- apy), and patient outcome impact (contribution to improved patient health). Each stage of this eval- uation process is dependent on the successful per- formance of the test in the previous stages. While almost ten years old, these principles remain use- ful. However, their implementation is difficult. Technical capacity is determined by ascer- 266 taining the test's validity (agreement between the mean test result and the true biological factor being measured) and reliability (degree of variance which occurs when the test is repeated on the same spec- imen). This is easily and commonly done for au- tomated laboratory tests. It is more difficult and less often performed for diagnostic tests, which are more dependent on operator performance and interpretation. Accuracy is assessed by comparing a test's re- sults with those of a reference standard. When a single criterion for a normal or abnormal test result is used, the sensitivity (ability of the test to detect disease when it is present) and specificity (ability to exclude disease when it is absent) of the test can be determined. Diagnostic accuracy can be better determined by considering multiple criteria using receiver operating characteristic (ROC) curve analysis and likelihood ratio analysis (for in- terval results). Estimation of a test's accuracy is subject to a number of methodologic errors. Inappropriate ap- plication of basic epidemiologic principles can lead to a variety of biases and wrong conclusions. 2' 3 Exclusion from the analysis of indeterminate or un- interpretable results can result in overly optimistic estimates of accuracy. A less frequently recognized problem is the se- lection of an appropriate reference standard. All reference standards are imperfect, and it is not al- ways clear which of the available reference stand- ards should be used. In this issue, Centor et al. explicitly address the former problem in their ar- ticle on the diagnosis of streptococcal pharyngitis (page xx), and Goodson et al. implicitly address the latter when they consider varying criteria for a significant mean blood glucose level in patients with diabetes mellitus (page xx). Future efforts to improve assessments of diagnostic technologies should address the problems relating to imperfect reference standards. 4 Most assessments of diagnostic technology evaluate a test in isolation from the clinical infor- mation that is already available. Yet the true value of a diagnostic test is determined by the marginal information it provides: what the new test can tell us over and above what we already know. Various methods (Bayes' theorem, multivariate analysis,

Evaluating diagnostic tests

Embed Size (px)

Citation preview

Page 1: Evaluating diagnostic tests

EDITORIALS

Evaluating Diagnostic Tests: What Is Done - - What Needs to be Done

ACCURATE DIAGNOSIS is central to the practice of modern medicine. Establishing a correct d iagnosis often depends on the avai labi l i ty of accurate di- agnostic tests. However, assessment of the accu- racy and clinical va lue of diagnost ic tests is an imperfect and rapidly evolving science, fraught with many unresolved issues. As internists who use di- agnostic tests, teach their appropriate use to med- ical students and trainees, and are involved in the assessment of tests, it is essent ial that we better unders tand the principles and shortcomings of di- agnostic test evaluat ion.

The primary purpose of diagnost ic tests is to provide clinical information which can discrimi- nate among disease states, thereby improving the physician 's managemen t of the patient. Diagnostic tests can also be used to screen for d i sease in asymptomatic individuals, to monitor the course of disease, or to establ ish prognosis in patients with establ ished diagnoses who are undergoing ther- apy. Thus, tests can be eva lua ted from the per- spective of their ability to accurately discriminate d iseased from non-diseased individuals, their mar- ginal information relative to other tests and pro- cedures, their impact on subsequent managemen t decisions, or their ul t imate impact on the patient 's health. However, all evaluat ions must determine whether, in specific clinical situations, the bene- ficial impact of new diagnost ic information war- rants the heal th and resource costs required to obtain it.

F inebe rg 1 has p roposed a h i e r a r ch i ca l ap- proach to the assessment of diagnost ic tests that involves determinat ion of technical capaci ty (abil- ity to present precise, accurate, reproducible in- formation), d i agnos t i c a c c u r a c y (abi l i ty to discriminate between patients with and without disease), and clinical value - - diagnostic impact (influence on the use of other diagnost ic tests and procedures), therapeutic impact (influence on the selection and delivery of more appropriate ther- apy), and patient outcome impact (contribution to improved patient health). Each stage of this eval- uation process is dependent on the successful per- formance of the test in the previous stages. While almost ten years old, these principles remain use- ful. However, their implementat ion is difficult.

Technical capaci ty is determined by ascer-

266

ta ining the test's validity (agreement be tween the mean test result and the true biological factor being measured) and reliability (degree of variance which occurs when the test is repea ted on the same spec- imen). This is easi ly and commonly done for au- tomated laboratory tests. It is more difficult and less often performed for diagnost ic tests, which are more d e p e n d e n t on ope ra to r p e r f o r m a n c e a n d interpretation.

Accuracy is a s sessed by comparing a test 's re- sults with those of a reference s tandard. When a single criterion for a normal or abnormal test result is used, the sensitivity (ability of the test to detect d isease when it is present) and specificity (ability to exclude d isease when it is absent) of the test can be determined. Diagnostic accuracy can be better determined by considering mult iple criteria using receiver operat ing characterist ic (ROC) curve a n a l y s i s a n d l ike l ihood ra t io a n a l y s i s (for in- terval results).

Estimation of a test's accuracy is subject to a number of methodologic errors. Inappropriate ap- plication of basic epidemiologic principles can lead to a variety of biases and wrong conclusions. 2' 3 Exclusion from the ana lys i s of indeterminate or un- interpretable results can result in overly optimistic es t imates of accuracy.

A less frequently recognized problem is the se- lection of an appropriate reference s tandard . All reference s tandards are imperfect, and it is not al- ways clear which of the ava i lab le reference stand- ards should be used. In this issue, Centor et al. explicitly address the former problem in their ar- ticle on the diagnosis of streptococcal pharyngi t is (page xx), and Goodson et al. implicitly address the latter when they consider varying criteria for a significant mean blood glucose level in pat ients with diabetes melli tus (page xx). Future efforts to improve assessments of diagnost ic technologies should address the problems relat ing to imperfect reference s tandards . 4

Most assessments of diagnost ic technology evaluate a test in isolation from the clinical infor- mation that is a l ready avai lable . Yet the true value of a diagnostic test is de termined by the marginal information it provides: what the new test can tell us over and above what we a l ready know. Various methods (Bayes' theorem, mult ivariate analysis ,

Page 2: Evaluating diagnostic tests

JOURNAL OF GENERAL INTERNAL MEDICINE, Volume 1 (July~Aug), 1986 267

ROC curves, Shannon's information content, phy- sician probability estimates) have been used but such assessment is infrequent.

The clinical value of a diagnost ic test is the impact of the test's performance on the patient 's care and outcome. Assessment of clinical value is difficult and, therefore, less commonly done. The clinical value of a diagnost ic test will vary accord- ing to the specifics of the clinical situation, includ- ing the available therapeutic options. Determining, especial ly retrospectively, the impact of a single test on the performance of other tests and proce- dures or on the institution of therapy is especial ly difficult, particularly when diagnostic tests are used in combination. In addition, as Kroenke et al. note in their article evaluat ing the admiss ion urinalysis (page xx), such determinat ions often require clin- ical da ta which are often difficult to obtain. The link between performance of a diagnostic test and a clinically b iased outcome becomes more difficult as the number of intervening factors - - e lapsed time, the therapy chosen, patient compliance, and coexisting d isease - - increases.

Using clinical criteria in assess ing diagnostic technologies also requires that one dis t inguish be- tween assessment of efficacy (performance under idea! conditions of use) and that of effectiveness (performance under average conditions of use) 5 Studies of clinical effectiveness are important be- cause diagnostic tests are applied under average conditions of use. Two of the studies publ ished in this issue report the limited clinical effectiveness of home urine testing in patients with diabetes mel- litus and routine admiss ion urinalysis in hospital- ized patients. However, l imitations in observed clinical effectiveness may result from either in- adequate test efficacy or inadequa te application of the test by the user. Wherever possible, such dis- tinctions should be at tempted.

A conceptual base exists for critically assess ing diagnostic tests. Although recent evaluat ions of di- agnostic tests are much improved over those per-

formed in the past, most are still i nadequa te for clinicians and policymakers. More improvement is necessary. Most important, there is an especial ly strong need to relate the performance of a diag- nostic test to the subsequent care and outcome of the patient. Appropriate clinical endpoints need to be better defined and measured.

We should also improve our evaluat ion meth- odology. We should increase the use of rigorous experimental and quasi-exper imental des igns and s ta te-of- the-ar t a n a l y t i c me thods . The recen t ly published report of the Institute of Medicine's Com- mi t tee for E v a l u a t i n g Medica l Techno log ie s in Clinical Use, Assessing Medical Technologies, 6 is required reading for anyone who intends to engage in technology assessment activities.

More research on the evaluat ion of diagnost ic tests is needed to provide a strong basis for as- sess ing the clinical usefulness of diagnost ic tests. Only when we have developed the abil i ty to better measure the impact of diagnost ic tests on the pa- tient's heal th will we know how to use diagnost ic tests more appropriately. - - J . Sanford Schwartz, MD, Section of General Medicine and Department of Medicine, Leonard Davis Institute of Health Eco- nomics, University of Pennsylvania, Philadelphia, Pennsylvania.

REFERENCES 1. Fineberg HV, Bauman R, Sosman M. Computerized cranial tomography:

effect on diagnostic and therapeutic plans. JAMA 1977;238:224-30 2. Ransahoff DF, Feinstein AR. Problems of spectrum and bias in evaluating

the efficacy of diagnostic tests. N Engl J Med 1978;299:926-30 3. Rozanski A, Diamond GA, Berrnan D, Forrester JS, Morris D. Swan HJC.

The declining specificity of exercise radionuclide ventriculography. N Engl J Med 1983;309:518-22

4. Schwartz JS. Evaluating diagnostic technologies. In Assessing Medical Technologies (Committee for Evaluating Medical Technologies in Clinical Use, Institute of Medicine). Washington, DC: National Academy Press, 1985, 80-9

5. Office of Technology Assessment, U.S. Congress. Assessing the efficacy and safety of medical technologies. Washington, DC: U.S. Government Printing Office, 1978

6. Committee for Evaluating Medical Technologies in Clinical Use, Institute of Medicine. Assessing Medical Technologies. Washington, DC: National Academy Press, 1985

The Uncertain Lessons of the Quinlan Case

THE DEATH, last year, of Karen Ann Quinlan has genera ted many discussions about ethical ques- tions involved with the definition of death and with the prolongation of life in hopelessly ill patients. Her tragic story, living in a coma for nine years after being taken off her respirator, focused na- tional at tention on such topics as the "quali ty of life" and the "right to die." As a result, a consensus

seems to have developed about the right of indi- viduals to control decisions relat ing to their own medical care, including the right to end their lives. Led by organizations such as the Society for the Right to Die and the Hemlock Society, legislat ion is being proposed in most s ta tes to es tabl ish Hu- mane and Dignified Death Acts. California, in 1976, led the nation by enact ing its Natural Death Act,