31
NATIONAL DEFENSE INTELLIGENCE COLLEGE Measuring Forecaster Performance Lt Col James E. Kajdasz, Ph.D., USAF

Measuring Forecaster Performance

  • Upload
    halden

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

Measuring Forecaster Performance. Lt Col James E. Kajdasz, Ph.D., USAF. Scholarship of Intelligence Analysis. - PowerPoint PPT Presentation

Citation preview

Page 1: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

Measuring Forecaster Performance

Lt Col James E. Kajdasz, Ph.D., USAF

Page 2: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEScholarship of Intelligence Analysis

• “A comprehensive review of the literature indicates that while much has been written, largely there has not been a progression of thinking relative to the core aspect and competencies of doing intelligence analysis.” (Mangio & Wilkinson, 2008)

• “Do [they] teach structured methods because they are the best way to do analysis, or do they teach structured methods because that’s what they can teach?” (Marrin, 2009)

Page 3: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEGrade forecasters on % correct?

judgments• We could grade forecaster accuracy similar to a T/F test. (yes/no answers)– Will Qadhafi still be in Libya at this time next year? No– Will the government of Yemen fall in the next year?

No– Will I still be driving my 2001 Corolla in the year

2020? Yes• Wait until outcomes occur/don’t occur, and

calculate percent of correct forecasts. • Compare Forecaster A to Forecaster B by

seeing who has the higher % correct.

Page 4: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEWhat about probabilistic judgments?

• When there is a high level of uncertainty, laypeople and even experts often qualify judgments. – Will Qadhafi still be in Libya at this time next year? No

(70% confidence)– Will the government of Yemen fall in the next year?

No (60% confidence)– Will I still be driving my 2001 Corolla in the year

2020? Yes (95% confidence)

Page 5: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEWhat about probabilistic judgments?

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 __ __ __ __ __ __ __ __ __ __ __

Impo

ssib

leHi

ghly

unlik

ely

Som

ewha

tun

likel

yAs

like

ly a

s ot

her

two

poss

ibili

ties

com

bine

d

Som

ewha

tlik

ely Hi

ghly

likel

yCe

rtain

ty

Tetlock, 2005

Page 6: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGELet’s Compare analysts…

• So which analyst performed best?

• It’s hard to say… We need a summary statistic to summarize total performance.

Probability assignedEvent Occurred? Analyst 1 Analyst 2 Analyst 3

1 No (0) 0 0 0.12 Yes (1) 0.9 0.7 0.73 No (0) 0.1 0.3 04 Yes (1) 0.7 0.5 0.55 Yes (1) 0.9 1 1

Page 7: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEMean Probability Score

• Probability Score or Brier Score

– Estimate: • Probability provided by forecaster• .00 – 1.00

– Outcome: • 0 (if event did not occur)• 1 (if event did occur)

( )PS2( )PS Estimate Outcome

Page 8: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEMean Probability Score

• Probability Score or Brier Score

– Forecaster says 70% probability X will occur.– X occurs.–

2( )PS Estimate Outcome

2 2(.70 1) ( .3) .09PS

( )PS

Page 9: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEMean Probability Score

• Mean Probability Score or Mean Brier Score 2 2(.70 1) ( .3) .09PS

( )PS

2 2(.50 0) ( .5) .25PS 2 2(.10 0) (.10) .01PS

.12PS

Page 10: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGELet’s Compare analysts…

Probability assignedEvent Occurred? Analyst 1 Analyst 2 Analyst 3

1 No (0) 0 0 0.12 Yes (1) 0.9 0.7 0.73 No (0) 0.1 0.3 04 Yes (1) 0.7 0.5 0.55 Yes (1) 0.9 1 1

0.02 0.09 0.07PS

Page 11: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEComponents of Total Forecaster Error

• Several things contribute to overall error, not all of which can be controlled by the forecaster.

Total Forecasting Error

Discrimination Errors

( )PS

CalibrationErrors

Variance of the Outcome

Page 12: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEDecomposing Mean Probability Score

PS

2Var(d) + (bias) [Var(d)(slope)](slope-2)+scatterPS

Bias Slope Scatter Var(d)

Page 13: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

Decomposing PS: Bias

Bias f d Where:

= Mean estimate

= Mean outcome

Arkes, Dawson, Speroff & et.al. (1995)

f

d

Est

imat

ed P

roba

bilit

y of

Sur

viva

l (f)

Outcome Index (d)

Page 14: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

Decomposing PS: Slope

1 0Slope f f Where:

= Mean estimate when outcome was 1

= Mean estimate when outcome was 0

Arkes, Dawson, Speroff & et.al. (1995)

1f

0f

Est

imat

ed P

roba

bilit

y of

Sur

viva

l (f)

Outcome Index (d)

Page 15: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

Decomposing PS: Scatter

Where:

= Variance when outcome was 1 = Variance when outcome was 0

Arkes, Dawson, Speroff & et.al. (1995)

1( )Var f

0( )Var f

Est

imat

ed P

roba

bilit

y of

Sur

viva

l (f)

Outcome Index (d)

Page 16: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGETitle

• Body

Patients DoctorsPS=.23 Bias=0.13 Slope=.13 Scat.=.05 PS=.18 Bias=-0.11 Slope=.26 Scat.=.05

Est

imat

ed P

roba

bilit

y of

Sur

viva

l (f)

Outcome Index (d) Outcome Index (d)

Arkes, Dawson, Speroff & et.al. (1995)

Page 17: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEPrediction Markets

Page 18: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEA-priori Hypotheses:

• H1: Discrimination will improve as the event nears

– Slope measure will increase over time.• H2: Scatter will decrease as the event nears

– Scatter measure will get smaller over time.• H3: Analysts will be biased toward predicting

the status quo– Bias measure will be negative

Page 19: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-70 Days

Page 20: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-60 Days

Page 21: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-50 Days

Page 22: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-40 Days

Page 23: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-30 Days

Page 24: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-20 Days

Page 25: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGET-10 Days

Page 26: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

• PS is a measure of overall error

• low PS is better

• Graph suggests curvilinear relationship with time

Total Error over Time

Page 27: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

• PS composed of Bias, Slope, Scatter, and Variance of the outcome

• Graph suggests decrease in error is primarily due to improvement in slope

• Slope is a measure of discrimination

• High slope is better

Components of Error

Page 28: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

• The observed slope was modeled.

• Curvilinear relationship modeled with Days and Days2

• Adj R2 = .834, p=.01• H1 supported.

Discrimination improves as date approaches.

.6

.4

.2S

lope

.0

-.2

Modeling Slope Over Time

Page 29: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

• Scatter is a measure of ‘spread’ of probability estimates.

• Slight linear trend not significant.

• H2 not supported.

Scatter Over Time

Page 30: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGEBias Over Time

• Questions recoded such that probability ‘0’ represented a continuation of status-quo, and probability ‘1’ represents a change in status-quo

• Analysts were biased toward predicting a change in the status-quo

– Indicated by positive bias numbers – t(6)=4.73, p < .01

• H3 not supported. • BUT significant results in the direction

opposite that hypothesized.• Linear trend over time not statistically

significant.

Page 31: Measuring Forecaster Performance

NATIONAL DEFENSE INTELLIGENCE COLLEGE

Lt Col James E. Kajdasz, Ph.D., [email protected]

The views expressed in this presentation are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S.

Government.