Diagnostic Tests Studies
87/3/2“How to read a paper” workshop
Kamran Yazdani, MD MPH
Diagnostic tests
When looking at a paper about a diagnostic test we ask ourselves three questions.
Diagnostic tests
Is this test useful?
Diagnostic tests
Is this test useful?Is it reliable?
Diagnostic tests
Is this test useful?Is it reliable?Is it valid?
Is this test useful?
The test should have been researched in a study population relevant to the individual or population in whom it is to be used.
Reliability
Reliability refers to the repeatability or reproducibility of a test.
It can be assessed by repeating the test using the same or different observers.
Validity
Relates to whether the test measures what it purports to measure. Is the result true?
It can be assessed by comparing the test results with a Gold Standard.
Validity
For example if you measure blood pressure in an obese patient and use a cuff that is too small you are likely to get a falsely high reading. The reading maybe reliable (you get the same blood pressure if you do it again) but it lacks validity.
Gold standardThe gold standard is the test or battery of tests that will most accurately diagnose a particular disease or condition.
The OGTT for diabetesFluoroscein angiography for diabetic retinopathy (too expensive or invasive)The Jones criteria for rheumatic fever (a battery of tests or symptoms)
What is your variable?
Sensitivity and specificity
Sensitivity and specificity
Ability of a test to accurately diagnose diseased and healthy individuals
SensitivitySpecificityLikelihood Ratio…
Sensitivity
DiseaseNo
Disease
Test ResultPositive
Negative
TP FP
FN TN
Sensitivity: The capacity of the test to correctly identify diseased individuals in a population; “TRUE POSITIVES”.
Gold Standard
Specificity
Specificity: The capacity of the test to correctly exclude individuals who are free of the disease; “TRUE NEGATIVES”.
DiseaseNo
Disease
Positive
Negative
TP FP
FN TNTest Result
Gold Standard
Sensitivity and Specificity
DiseaseNo
Disease
Positive
Negative
TP FP
FN TN
Sensitivity Specificity TP/TP+FN TN/FP+TN
Test Result
Gold Standard
Example
DiseaseNo
Disease
Positive
Negative
75
25
20
180
100 200 300
95
205
Sensitivity = 75/100 = 75%
Specificity = 180/200 = 90%
Test Result
Gold Standard
Accuracy of the test
(a+d)/(a+b+c+d)
DiseaseNo
Disease
Positive
Negative
a
c
b
d
a+c b+d 300
a+b
c+d
Test Result
Gold Standard
Positive Predictive Value
DiseaseNo
Disease
Positive
Negative
TP FP
FN TN
PPV: The probability of the disease being present, among those with positive diagnostic test results
PPV = TP/TP+FPTest Result
Gold Standard
Negative Predictive Value
DiseaseNo
Disease
Positive
Negative
TP FP
FN TN
NPV: The probability that the disease was absent, among those whose diagnostic test results were negative
NPV = TN/TN+FN
Test Result
Gold Standard
The effect of Sens, Spec, and P on PPV and NPV
PPV NPV
Prevalence
Sensitivity Specificity 90% 50% 10% 90% 50% 10%
70% 60% 94% 64% 16% 18% 67% 95%
70% 90% 98.4% 88% 44% 25% 75% 96%
80% 90% 98.6% 89% 47% 33% 82% 98%
90% 90% 98.7% 90% 50% 50% 90% 99%
100% 5% 2% 51% 10% 100% 100% 100%
5% 100% 100% 100% 100% 98% 51% 90%
There are some predictors other than the prevalence:
What do we do in clinic?
Likelihood ratio
Likelihood of (+) test in diseased Likelihood of (+) test in diseased personspersonsLR Positive =LR Positive =
Likelihood of (+) test in healthy Likelihood of (+) test in healthy personspersons
Likelihood of (-) test in diseased Likelihood of (-) test in diseased personspersonsLR Negative=LR Negative=
Likelihood of (-) test in healthy Likelihood of (-) test in healthy personspersons
SensitivitySensitivityLR Positive =LR Positive =
1 - Specificity1 - Specificity
1 - Sensitivity1 - SensitivityLR Negative=LR Negative=
SpecificitySpecificity
Likelihood ratio
Sensitivity = 90%
Specificity = 90% SensitivitySensitivity 0.90 0.90LR Positive =LR Positive = == = 9 = 9
1 – Specificity1 – Specificity 1 – 0.90 1 – 0.90
1 – Sensitivity1 – Sensitivity 1 – 0.90 1 – 0.90LR Negative=LR Negative= == = 1/9 = 1/9
SpecificitySpecificity 0.90 0.90
LR nomogram
Example
5000 pregnant women underwent a test
for blood glucose at 24 weeks, following a
glucose load. 243 women were found to
have a blood glucose greater than 6.8
mmol/L and were referred for an OGTT.
186 were found to have gestational
diabetes. Four women who initially had
tested negative were diagnosed as having
diabetes later in their pregnancy.
Diabetes No diabetes Total
Positive 186 57 243
Negative 4 4753 4757
Total 190 4810 5000
Example
Prevalence
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Likelihood ratio + test
Likelihood ratio - test
Accuracy
Example
Prevalence
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Likelihood ratio + test
Likelihood ratio - test
Accuracy
190/5000
186/190
4753/4810
186/243
4753/4757
(186/190)/(57/4810)
(4/190)/(4753/4810)
(186+4753)/5000
Example
Prevalence
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Likelihood ratio + test
Likelihood ratio - test
Accuracy
3.8%
97.9%
98.8%
76.5%
99.9%
82.6
.02
98.8%
Continuous Measurements
Diseased
Cutoff Value for Positive Test
TN TP
FPFN
Healthy
IOP
Pro
port
ion
Continuous Measurements
Healthy Diseased
IOP
Pro
port
ion
Cutoff Value for Positive Test
TN TP
FP
FN
Continuous Measurements
Healthy Diseased
IOP
Pro
port
ion
Cutoff Value for Positive Test
TN TP
FP
FN
Receiver operator curves
By plotting the sensitivity and specificity of a test for different cut off points a ROC can be produced which helps illustrate the optimum cut off point to use.
Receiver Operator Characteristic CurveROC Curve
1 - Specificity
Sen
siti
vity
0
1
ROC Curve Analysis
1
5 6
3
4
2
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
1-Specificity (FPR)
Sensitivity
Receiver operator curves
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
0.2
0.4
0.6
0.8
1
False positive rate
True positive rate
>280
>80>40
ROC for creatinine kinase for diagnosing MI
CASP checklist
Improve the accuracy and completeness of research reporting and allow readers to assess the “potential for bias” in the study reported.
Standards for Reporting of Diagnostic Accuracy (STARD)
Always use:
FLOW CHART or Diagram
CHECKLIST
FLOW CHART or Diagram
STARD checklist
Section & Topic Item#
TITLE/ABS…./KEYWORDS
1 Identify the article as a study of diagnostic accuracy (recommend MeSHHeading ‘sensitivity and specificity’).
INTRODUCT… 2 State the research questions or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups.
Section & Topic
Item#
METHODSParticipants 3 Describe the study population: The inclusion and exclusion criteria, setting
and locations where the data were collected.
4 Describe participant recruitment: Was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had received the index tests or the reference standard?
5 Describe participant sampling: Was the study population a consecutive series of participants defined by the selection criteria in items 3 and 4? If not, specify how participants were further selected.
6 Describe data collection: Was data collection planned before the index test and
reference standard were performed (prospective study) or after (retrospective study)?
Test methods
7 Describe the reference standard and its rationale.
8 Describe technical specifications of material and methods involved including how andwhen measurements were taken, and/ or cite references for index tests and referencestandard.
9 Describe definition of and rationale for the units, cutoffs and/or categories of the results of the index tests and the reference standard.
10 Describe the number,training and expertise of the persons executing and reading theindex tests and the reference standard.
11 Describe whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other tests and describe any other clinical information available to the readers.
STARD checklist
Section & Topic
Item #
METHODSStatistical methods
12 Describe methods for calculating or comparing measures of diagnostic accuracy, andthe statistical methods used to quantity uncertainly (e.g. 95% confidence intervals).
13 Describe methods for calculating test reproducibility, if done.
STARD checklist (METHODS Continued….)
Section & Topic
Item #
RESULTS
Participants
14 Report when study was done, including beginning and ending dates of recruitment.
15 Report clinical and demographic characteristics of the study population (e.g. age,sex, spectrum of presenting symptoms, comorbidity, current treatments, recruitment centers).
16 Report the number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or the reference standard; describe why participants failed to receive either test ( a flow diagram is strongly recommended).
Test results
17 Report time interval from the index tests to the reference standard, and any treatment administered between.
18 Report distribution of severity of disease (define criteria) in those with the targetcondition; other diagnoses in participants without the target condition.
19 Report a cross tabulation of the results of the index tests (including indeterminate andmissing results) by the results of the reference standard; for continuous results, the distribution of the test results by the results of the reference standard.
20 Report any adverse events from performing the index tests or the reference standard.
STARD checklist
Estimates
21 Report estimates of diagnostic accuracy and measures of statistical uncertainty (e.g. 95% confidence intervals).
22 Report how indeterminate results, missing responses and outliers of the index tests were handled.
23 Report estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done.
24 Report estimates of test reproducibility , if done.
DISCUS…..
25 Discuss the clinical applicability of the study findings.
STARD checklist
Thank You