Course overview, the diagnostic process, and measures of interobserver agreement Thomas B. Newman,...

Preview:

Citation preview

Course overview, the diagnostic process, and measures of interobserver agreement

Thomas B. Newman, MD, MPH

September 18, 2008

Overview Administrative stuff Overview of the course The diagnostic process Interobserver agreement

– Continuous variables– Categorical variables

• Concordance• Kappa

– Regular– Weighted

Administrative stuff Introductions Basic structure of course

– New material each week in lecture– Read material before lecture if possible– HW on that material due the following week in

section– Exceptions:

• No class October 9• Penultimate class 12/4 – Chapter 12 (Challenges for

EBD) and course review: pass out take-home exam; no HW on Ch 12

• Last lecture 12/11: review of take-home exam Lectures: mixture of PPT and Whiteboard

– How many want paper copies of PPT slides?

SECTIONS

Section assignments: Click ROSTER on Epi 204 website

Section rooms: Click SCHEDULE on website

Faculty will rotate; students, rooms and TA's will be constant for the quarter

Homework Required – key way of learning material Which problems are assigned announced

in SECTION and (later) posted on web Not graded if late, but can still be turned in;

answers on web Use fresh sheets of paper with your name

on each, not syllabus pages, not e-mail. (You can download and word-process if you want, but print a copy unless section leader prefers electronic.)

Will be graded by section leaders and returned the following week

Getting help Classmates, then section leaders, then

faculty Ambiguous/confusing problems – send

e-mail to section leader or me– Unless you indicate otherwise, we will

assume we can cc the whole class when we respond if we think question is of general interest

Textbook

TBN and MAK have almost finished a book, “Evidence-based Diagnosis” (Cambridge University Press, 2009)

Other texts listed in on web Copies of other books in bookstore and on

reserve in the library and available for browsing here

Grading, honor code, etc. Worst HW score dropped; all other HW count

equally 2/3 Homework avg + 1/3 final examination

OR 1/3 Homework avg + 2/3 final examination, whichever is better

Try all problems on your own first; OK to help each other with HW but– Acknowledge help– Write answer in own words

Do not collaborate on final exam Honor code taken seriously

Course overview Diagnosis

– Theory– Inter-rater reliability– Dichotomous tests– Multilevel tests– Studies of tests– Combining tests

Screening and prognostic tests Treatments: randomized trials Alternatives to randomized trials P-values and confidence intervals; Bayes' theorem Clinicians and probability

Diagnostic process Why do we want to assign a name to

this person’s illness? Different reasons lead to different

classification schemes

Examples Acute nephrotic syndrome Acute leukemia Attention deficit disorder Dysuria worth a course of antibiotics SLUBI=Self-limited undiagnosed benign

illness

Simplified Generic Decision Problem

Patient either has the disease or not If D+, net benefit of treatment If D-, better not to treat (“Treat” could include doing more tests)

Simplifying assumptions (often wrong) Test results are dichotomous

– Most tests have more than two possible answers

Disease states are dichotomous– Many diseases occur on a spectrum– There are many kinds of “nondisease”

Evaluating diagnostic tests

Reliability Accuracy Usefulness

Today we do reliability

Types of variables

Categorical– Dichotomous – 2 values– Nominal – no intrinsic ordering – Ordinal – intrinsic ordering

Continuous (infinite number of values) vs Discrete (limited number)

Measuring interobserver agreement for categorical variables

Gallop heard by Observer B

No gallop heard by Observer B

Total, Observer A

Gallop heard by Observer A 20 15 35No gallop heard by Observer A 10 55 65Total, observer B 30 70 100

What is agreement?

Concordance rate

What percent of the time do the 2 observers agree (exactly)

Advantage: easy to understand Disadvantage: may be misleading if

observers agree on prevalence of abnormality

Concordance rate problem

Gallop heard by Observer B

No gallop heard by Observer B

Total, Observer A

Gallop heard by Observer A 0 5 5No gallop heard by Observer A 5 90 95Total, observer B 5 95 100

Unbalanced Disagreement

Lesion # RATER A RATER B

1 S S

2 S S

3 S M

4 S M

5 S M

6 M M

7 M L

8 L L

9 L L

10 L L

BA S M L Total

S 2 2 1  5M 0 0 2  2L 0 0 3  3

Total 2 3 6

What is going on here? Look for lack of balance

above and below diagonal Results when observers

have different thresholds

Definition of Kappa The amount of agreement beyond what

would be expected by chance* Formula:

Practice– Obs = 90%, Exp = 80%, K =– Obs = 70%, Exp = 60%, K =– Obs = 60%, Exp = 70%, K =

*Given the observed marginals

Observed agreement – Expected agreement

1 – Expected agreement

Calculation of Expected Agreement from Marginals

Gallop heard by Observer B

No gallop heard by Observer B

Total, Observer A

Gallop heard by Observer A 20 15 35No gallop heard by Observer A 10 55 65Total, observer B 30 70 100

GCS Eye opening- Observed

Doc #2None To Pain To

CommandSpontaneous Total

None 11 2 0 4 17To Pain 4 1 2 0 7

To Command 0 3 8 3 14Spontaneous 2 1 7 68 78Total 17 7 17 75 116

Emergency Physician #2

GCS Eye Opening: Expected

Doc #2None To Pain To

CommandSpontaneous Total

None 2.5 1 2.5 11 17To Pain 1 0.4 1 4.5 7

To Command 2.1 0.8 2.1 9.1 14Spontaneous 11.4 4.7 11.4 50.4 78Total 17 7 17 75 116

Emergency Physician #2

17 x 78/116 = 1326/116 = 11.4

Why does multiplying row total by column total and dividing by N give you the expected agreement?

Weighted Kappa Weighted kappa

– Linear– Quadratic– Custom

Real-life illustration: Rating of neurological examination Types of weights, Stata illustration.

. tab ex1 ex2

. kap ex1 ex2, w(w)

. kap ex1 ex2, w(w2) (See Appendix 2.1)

What does observed Kappa depend upon?

How well people agree SPECTRUM within classifications

– E.g., re the abnormal ones VERY abnormal?– Difficult cases can be excluded or over-sampled

PREVALENCE of classifications by the various observers (and whether they agree on prevalence)

Chance (random error; people can get lucky/unlucky)

Weighting scheme used

Wireless Internet Access

Key is n2xa8!wr

Recommended