Validity of Screening Tests

VALIDITY AND REALIABILITY OF SCREENING TESTS

SCREENING OR DIAGNOSTIC TESTS

Epidem. is about (among other things) determining the prevalence or incidence of disease in pops.

Usually, pop. is examined to decide whether condition is present or not.

Screening procedure is for early detection of disease process.

Procedure & examiner must be valid and Reliable.

VALIDITY & RELIABILITY:

Validity (accuracy) is about recognizing a condition where it is present or not.

There must also be reliability (Consistency).

Reliability is about being able to produce the same finding when examination done more than once.

VALIDITY :

Simply ability of test to do what it purports to do (ACCURATE).

- i.e. correctly categorize those that are +ve.or correctly categorize those that are –ve

Consider a diagnostic test (e.g. Dipstix). (For Dichotomous Results.)

DISEASE STATUS DIAGNOSIS (TRUTH)

SCREENINGTEST

POSITIVE NEGATIVE TOTAL

POSITIVE a (TP) b (FP) a+b

NEGATIVE c (FN) d (TN) c+dTOTAL a+c b+d a+b+c+d

VALIDITY Cont….:

a = Those with disease detected by test (True positives - TP)

b= Those without disease that test says they do (False positives - FP)

c= Those with disease that test says don’t have (False negatives - FN)

d= Those without disease which test says they don’t (True negatives - TN)

Measuring (Quantifying) validity:

Sensitivity:= Proportion of positives (Those with

disease) that the test is able to detect.i.e. a (Probab. that +ve will be called

+ve) a+c

(Able to give +ve findings when person has disease)

Measures of validity cont…:

Specificity:= Proportion of those without disease

that test was able to detect i.e. d (Probability of a -ve being called -

ve) b+d(Able to give -ve findings when person

has no disease)


Accuracy is thus sensitivity and specificity. As sensitivity ↑ false Negative↓ (FN) As specificity ↑false Positive ↓ (FP)

Measures of validity cont…: In setting up test cut off point

(Sensitivity or specificity), must consider consequences of missing a positive or a negative. ↑Sensitivity (and↓specifity) when disease

is serious and treatment exists or when spreading at high rate. (HIV !!!!)

Measures of validity cont…: Desirable to have a high (100%) sensitivity

and specificity. In real life it isn’t so, espec. Continuous

variables. Lowering criterion for +ve means more people

with disease will test +ve (↑ sensitivity) But people without disease will also ↑ among

those testing positive (↓Specificity). (Thus the test will be very sensitive but less specific).

When ↑criterion those without disease will ↑ (↑specificity). But those with disease will ↓. Thus it will be more specific but less sensitive.

Measuring (Quantifying) validity…:

Predictive Values: Accuracy of a test is alternatively

described as: The extent to which being

categorized as positive or negative predicts the presence or absence of the disease.

This is given as positive or negative predictive values.


Positive Predictive Value (PV+):

= (Predictive value of a positive test) is percentage of persons who are deemed positive by new test and confirmed so by standard.

Measures of validity cont…: Negative Predictive Value (PV-):= (Predictive Value of a negative

test) is percentage of persons who are deemed negative by new test and confirmed so by standard.

(This is proportion of people being correctly labeled diseased or not disease).

SCREENING TEST

+ - TOTAL

+TP (a) FP (b) TP+FP

-FN (c) TN (d) FN+TN

TOTAL TP+FN FP+TNTP+FP+ FN+TN

GOLD STANDARD (DIAGNOSTIC) TEST

Validity…: Sensitivity = TP

TP +FN Specificity = TN

FP +TN PV + = TP

TP +FP

PV - = TNFN +TN

Measures of validity cont…: PV+ = a (Proportion of +ves by test

who are a+b actually with disease). PV- = d (Proportion of -ves by test who

are c+d actually without disease).


In rare disease PV- are high cause most of those tested will be –ve.

Predictive Values depend not only on validity of test (Sensitivity, specificity) but also on prevalence of disease.

Measures of validity cont…: Test that is more specific will make person with

+ve test likely to have the disease. Thus the greater the PV+ (More accurately spotting the–ve).

Test that is more sensitive will make person with a –ve test likely to have no disease. Thus the greater the PV-.

No matter how specific test is, the positives in a disease with low prevalence are likely to be false positives.

PREDICTIVE VALUE & SPECIFICITY OF TEST

Specificity is one factor that affects PV of a test.

Thus increase in specificity results in a much greater ↑ in PV+ than does the same ↑in sensitivity.

+

-

+ -SCREENING TEST

1000

Diagnostic test

Prev. = 50%Sens. = 50%Spec. = 50%PV+ = ??

+

-

+ -

(1000)

Diagnostic Test

Screening Test Prev. =

20%Sens. = 50%Spec. = 50%PV+ = ??

+

-

+ -

(1000)

Diagnostic Test

Screening Test

Prev. = 20%Sens. = 90%Spec. = 50%PV+ = ??

+

-

+ -

(1000)

Diagnostic Test

Screening Test Prev. = 20%

Sens. = 50%Spec. = 90%PV+ = ??


Relationship between disease prevalence and predictive value in a test with 95% sensitivity and 95% specificity.

At 0 prevalence, chance –ve test has no disease is 100% (PV-) and the chance that a +ve test has disease is 0% (PV+).

The rise in prevalence is accompanied by a rise in PV+ and decrease in PV-. At 40% Prev. PV+ rises to peak while PV- declines lower.

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

Prevalence of disease (%)

Pred

ictiv

e va

lue

(%)

Negative test Positive test


Most of gain in PV+ occurs with increase in prevalence at the lowest rates of disease prevalence i.e.

(1% - 5% Prevalence associated with 17% - 51% of predictive value).

(Prev. 20%; Pop=1000; Sensitivity=90% Specificity=80% - Calculate PV+).

REATIONSHIP OF DISEASE PREV. TO PREDICTIVE VALUEEXAMPLE: SENSITIVITY = 99% SPECIFICITY = 95%

DISEASE PREV.

TESTS RESULTS

SICK NOT SICK

TOTALS PREDICTIVE (+VE) VALUE

1% +-

TOTALS 10,000

5% +

-

TOTALS 10,000

REATION BETWEEN SPECIFITY AND PREDICTIVE VALUEEXAMPLE: PREVALENCE = 10%, SENSITIVITY = 100%

SPECIFICITY TESTS RESULT

S

SICK NOT SICK TOTALS PREDICTIVE VALUE (+VE)

70% +-

TOTALS

10,000

95% +-

TOTALS

10,000

Validity Cont…:

Why worry about disease prevalence; The higher the prevalence, the higher the

predictive +ve value. Screening test is more efficient if targeted

on high risk pop. Screening low prevalence pops can be

wasteful and yields few detected cases for large efforts applied.

SUMMARY Cont…:

Sensitivity:- Is calculated from test results of diseased persons.

-It is totally independent of the test results of the non-diseased.

SUMMARY Cont….: Specificity:

- Is calculated from test results of non-diseased persons.- It is totally independent of the test results of the diseased.

Predictive values rely on both results of diseased and non-diseased. Always a high predictive value is preferred.

SUMMARY Cont…:

Altering cut-off point diagnostic test may affect sensitivity and specificity. e.g in BP for Hypertension.

↑BP defined as Diastolic 90mmHg or more. But have some hypertensives between

80mmHg & 90mmHg. If cut off is reduced to 80mmHg i.e. all with

80mmHg are hypertensive,

SUMMARY Cont….:

All with hypertension (True) +ves will be detected (↑sensitivity).

But those without will also ↑ (false +ves) which is ↓specificity,

So test will be very sensitive but not specific. When we ↑cut off point to 100mm Hg diastolic Those without hypertension will all be detected

↑true negatives (↑specificity). But those with disease will ↓(↓ in true positives)

which is ↓ in sensitivity. So test will be very specific but not sensitive.

SUMMARY Cont…: In setting sensitivity or specificity levels, must

consider consequences of: Missing actual cases (Positives e.g. Ca. Cervix). Missing actual negatives (HIV).

↑sensitivity when disease is serious and treatment exists or when spreading at high rate and is serious.

↑Specificity (PV+) when treatment procedure is cumbersome and expensive (e.g. mastectomy).

But when early detection is important for complete cure and treatment is invasive, then balance the two.

RELIABILITY (REPEATABILITY, PRECISION, REPRODUCABILITY)

Test gives consistent results when test repeated on same person under same condition.

Four sources of variability that can affect the reproducibility of a screening test. Inherent biological variability in person being

tested e.g. BP. Varies in individuals under differing circumstances.

Reliability of the instrument being used or the test method e.g. when temp ↑or equipment tilted.

Intra-observer variability.

Reliability Cont….: Inter-Observer variability

- Two observers- Extent to which observers agree or

disagree can be put in quantitative terms.

Calculating Overall (%) AgreementX-RAYS

RADIOLOGIST (OBSERVER 2)

RADIOLOGIST (OBSERVER 1)

NORMAL SUSPECT DOUBTFUL ABNORMAL

NORMAL (A) B C D

SUSPECT E (F) G H

DOUBTFUL J K (L) M

ABNORMAL N O P (Q)

Overall (%) Agreement

Percent Agreement= A+F+L+Q x 100 Total readings (Total x-rays read).

In general most people who are tested have negative results.

Considerable agreement is therefore found (between two observers) in negative or normal tests i.e. when no disease its easier to detect for both observers.

% Agreement…: When one calculates percentage

agreement on all subjects (population) per cent agreement may be high because of the high agreement among negative tests. (Those with obvious disease are few. Doubtful cases are more difficult and few).

OBSERVER 2

+ -

+

OBSERVER 1

-

a b

c d Can Ignore (d)

% Agreement…: This high value of percent agreement

because of the –ve tests, tend to conceal significant disagreements between the observers in regard to identification of subjects as positive.

- hence a

a+b+cwill only address % agreement in regard to identifying the sick.

Kappa Statistic : (coefficient): Agreement between two observers can be

purely by chance e.g. If no standard or criteria for reading x-rays, agreement in many cases is purely by chance.

Question we ask is: To what extent do their readings agree beyond

what we would expect by chance alone. Or

To what extent does agreement between the two observers exceed the level of agreement that would result just from chance.

Kappa Statistic : (coefficient): The Kappa Statistic is used to

calculate this extent: Kappa: Numerator: is percent observed

agreement minus per cent agreement expected by chance alone. (Deals with actual observations).

Kappa Statistic (coefficient):

Denominator: Difference between full agreement and percent agreement expected by chance alone.

Thus Kappa quantifies the extent to which observed agreement exceeds that which would be expected by chance alone.

Kappa Statistic (coefficient): To calculate Kappa, first calculate observed

agreement.

A = Identifies 45 slides i.e. 60% of 75 total as grade II.

B = Identifies 44 or 58.6% of all slides as grade II.To calculate % agreement the formula is:a+d x 100%

a+b+c+d

In this case % observed agreement is:41+27 x 100 = 90.7%

75

PATHOL A

GRADE II III

II 44 (56.8%)

PATHOL B

III 31 (41.4%)

45 (60%) 30 (40%)

41a

3b

4c

27d

Kappa Statistic (coefficient): If 2 pathologists used entirely different

sets of criteria, how much agreement would be expected solely on the basis of chance?

A read 60% of all 75 slides as grade II.

Kappa Statistic (coefficient): If A applied a criteria independent of

that used by B, Then A would read as grade II, 60% of

those that B has called grade II and 60% of those that B called grade III would be grade II by A.

Thus 60% of slides called grade II by B = 60 x 44 = 26.4

100

GRADE A

GRADE II III

II 44 (56.8%)

GRADE B

III 31 (41.4%)

45 (60%) 30 (40%)

26.4a

17.6b

18.6c

12.4d


60% of slides called grade III by B will be grade II by A

= 60 x 31 = 18.6 100 Thus Agreement expected by chance

alone= 26.4+12.4 x 100 = 51.7%

75

Kappa Statistic : (coefficient):

Kappa is calculated by formula = (% Obser. Agre.) – (% agre. Expec. by

chance) 100% - (% agre. Expec. by chance)= 90.7% - 51.7% = 39% = 81

100% - 51.7% 48.3%


Its suggested that a Kappa of : 0.75 and above is excellent agreement

beyond chance. 0.40 is poor agreement.

Between .40 and .75 is intermediate agreement

Documents

Validity of Screening Tests