59
1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona [email protected] RFC Verification Workshop, 08/14/2007

Verification Continued… Holly C. Hartmann

  • Upload
    yakov

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona [email protected]. RFC Verification Workshop, 08/14/2007. Agenda. Introduction to Verification Applications, Rationale, Basic Concepts Data Visualization and Exploration - PowerPoint PPT Presentation

Citation preview

Page 1: Verification Continued…  Holly C. Hartmann

1

Verification Continued…

Holly C. Hartmann

Department of Hydrology and Water ResourcesUniversity of Arizona

[email protected]

RFC Verification Workshop, 08/14/2007

Page 2: Verification Continued…  Holly C. Hartmann

2

1. Introduction to Verification- Applications, Rationale, Basic Concepts- Data Visualization and Exploration- Deterministic Scalar measures

2. Categorical measures – KEVIN WERNER- Deterministic Forecasts- Ensemble Forecasts

3. Diagnostic Verification- Reliability- Discrimination- Conditioning/Structuring Analyses

4. Lab Session/Group Exercise- Developing Verification Strategies- Connecting to Forecast Operations and Users

Agenda

Page 3: Verification Continued…  Holly C. Hartmann

3

Probabilistic Ensemble Forecasts

From: California-Nevada River Forecast Center

Page 4: Verification Continued…  Holly C. Hartmann

4

Probabilistic Ensemble Forecasts

From: California-Nevada River Forecast Center

Page 5: Verification Continued…  Holly C. Hartmann

5

Probabilistic Ensemble Forecasts

From: A. Hamlet, University of Washington

Page 6: Verification Continued…  Holly C. Hartmann

6

From: A. Hamlet, University of Washington

Page 7: Verification Continued…  Holly C. Hartmann

7

Probabilistic Ensemble Forecasts

From: A. Hamlet, University of Washington

Page 8: Verification Continued…  Holly C. Hartmann

8

• Identifies systematic flaws of an ensemble prediction system.• Shows effectiveness of ensemble distribution in sampling the observations. • Does not indicate that the ensemble will be of practical use.

Talagrand Diagram – Also Called Ranked Histogram

Page 9: Verification Continued…  Holly C. Hartmann

9

With only one ensemble member ( | ) all (100%) observations ( ) will fall “outside”

With two ensemble members two out ofthree observations ( 2/3=67%) should fall outside

With three ensemble members two out of four observations ( 2/4=50%) should fall outside

|

| |

| | | For any number of ensemble members, 2/#members should fall outside the ensemble

• Identifies systematic flaws of an ensemble prediction system.• Shows effectiveness of ensemble distribution in sampling the observations. • Does not indicate that the ensemble will be of practical use.

Principle Behind Talagrand Diagram

Talagrand Diagram – Also Called Ranked Histogram

Adapted from A. Persson, 2006

Page 10: Verification Continued…  Holly C. Hartmann

10

Talagrand Diagram Computation Example

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year)

Step 1: Rank lowest to highest for each year. Four members results in 5 bins.

Step 2: Determine which bin the corresponding observation falls into.

Step 3: Tally how many observations fall in each bin.

Step 4: Plot frequency of observations for ranked bin.

Bin #

5

3

5

4

5

3

4

4

5

Bin1 Bin2 Bin3 Bin4 Bin5

Bin # Tally

1 2 3 4 5

Page 11: Verification Continued…  Holly C. Hartmann

11

Talagrand Diagram Computation Example

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year)

Step 1: Rank lowest to highest for each year. Four members results in 5 bins.

Step 2: Determine which bin the corresponding observation falls into.

Step 3: Tally how many observations fall in each bin.

Step 4: Plot frequency of observations for ranked bin.

Bin #

5

3

5

4

3

1

2

5

3

4

4

5

Bin1 Bin2 Bin3 Bin4 Bin5

Bin # Tally

1 1 2 1 3 3 4 3 5 4

Page 12: Verification Continued…  Holly C. Hartmann

12

Talagrand Diagram

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5

Categor y Defi ned by Or der ed E nsembl e Member s

Talagrand Diagram Computation Example

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year)

Bin #

5

3

5

4

3

1

2

5

3

4

4

5

Bin # Tally

1 1 2 1 3 3 4 3 5 4

Bin1 Bin2 Bin3 Bin4 Bin5

Fre

qu

ency

Page 13: Verification Continued…  Holly C. Hartmann

13

Talagrand Diagram: 25 traces/ensemble, 375 observations

Example: “U-Shaped”Observations too often falling outside ensembleIndicates ensemble spread too small

Example: “L-Shaped”Observations too often larger (smaller) than ensembleIndicates under- (over-) forecasting bias

Example: “N-Shaped” (domed shaped)Observations too rarely falling outside ensembleIndicates ensemble spread is too big

Example: “Flat-Shaped”Observations falling uniformly across ensembleIndicates appropriately sized ensemble distribution

0

5

10

15

20

25

1 3 5 7 9

11

13

15

17

19

21

23

25

Category(defined by ordered ensemble members)

Re

lati

ve

Fre

qu

en

cy

of

An

aly

sis

0

24

6

810

12

14

1618

20

1 3 5 7 9

11

13

15

17

19

21

23

25

Category(defined by ordered ensemble members)

Rel

ativ

e Fr

equ

enc

y o

f Ana

lys

is

0

2

4

6

8

10

12

14

16

18

1 3 5 7 9

11

13

15

17

19

21

23

25

Category(defined by ordered ensemble members)

Re

lati

ve

Fre

qu

en

cy

of

An

aly

sis

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Category(defined by ordered ensemble members)

Re

lati

ve F

req

ue

nc

y o

f A

na

lys

is

Page 14: Verification Continued…  Holly C. Hartmann

14

Talagrand Diagram Example: Interpretation?

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Bin #

5

3

5

4

3

1

2

5

3

4

4

5

Bin # Tally

1 1 2 1 3 3 4 3 5 4

Bin1 Bin2 Bin3 Bin4 Bin5

Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year)

Talagrand Diagram

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5

Categor y Defi ned by Or der ed E nsembl e Member s

???

Fre

qu

ency

Page 15: Verification Continued…  Holly C. Hartmann

15

Distributions-orientedForecast Evaluation

leads toDiagnostic Verification

It’s all about conditional and marginal distributions!

P(O|F), P(F|O), P(F), P(O)

Reliability, Discrimination, Sharpness, Uncertainty

Page 16: Verification Continued…  Holly C. Hartmann

16

Forecast Reliability -- P(O|F)

For a specified forecast condition, what does the distribution of observations look like?

Forecasted Probability

Rel

ativ

e fr

equ

ency

of

obse

rved

0

1

01

Forecasted Probability

Rel

ativ

e fr

equ

ency

of

obse

rved

0

1

01

User perspective: “When you say 20% chance of flood flows, how often do flood flows actually happen?”

User perspective: “When you say 80% chance of flood flows, how often do flood flows actually happen?”

Page 17: Verification Continued…  Holly C. Hartmann

17

Good reliability – close to diagonal

Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows shows marginal distribution of forecasts

The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome?

Reliability (Attributes) Diagram – Reliability, Sharpness

Page 18: Verification Continued…  Holly C. Hartmann

18

Reliability Diagram Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 1: Choose threshold value to base probability forecasts on. For simplicity we’ll choose the mean forecast over all years and all ensembles (= 208).

Page 19: Verification Continued…  Holly C. Hartmann

19

Reliability Diagram Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1)

Step 3: For each forecast, calculate the forecast probability below the threshold value.

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.5

1.0

Page 20: Verification Continued…  Holly C. Hartmann

20

Reliability Diagram Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1)

Step 3: For each forecast, calculate the forecast probability below the threshold value.

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.25

0.5

0.5

1.0

Page 21: Verification Continued…  Holly C. Hartmann

21

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories).

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.25

0.5

0.5

1.0

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/A

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227

Reliability Diagram Example Computation

P(peak < 208) = 1.0

Page 22: Verification Continued…  Holly C. Hartmann

22

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories).

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.25

0.5

0.5

1.0

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/A

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227

P(peak < 208) = 1.0

112, 156, 167

Reliability Diagram Example Computation

Page 23: Verification Continued…  Holly C. Hartmann

23

Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs.

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/AP(peak < 208) = 1.0

112, 156, 167

P(obs peak < 208 given [P(peakfor < 208) = 0.0]) = 0/1 = 0.0

P(obs peak < 208 given [P(peakfor < 208) = 0.25]) = 1/3 = 0.33

P(obs peak < 208 given [P(peakfor < 208) = 0.5]) = 1/5 = 0.2

P(obs peak < 208 given [P(peakfor < 208) = 1.0]) =

P(obs peak < 208 given [P(peakfor < 208) = 0.75]) = 0/0 = NA

Reliability Diagram Example Computation

Page 24: Verification Continued…  Holly C. Hartmann

24

Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs.

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/AP(peak < 208) = 1.0

112, 156, 167

P(obs peak < 208 given [P(peakfor < 208) = 0.0]) = 0/1 = 0.0

P(obs peak < 208 given [P(peakfor < 208) = 0.25]) = 1/3 = 0.33

P(obs peak < 208 given [P(peakfor < 208) = 0.5]) = 1/5 = 0.2

P(obs peak < 208 given [P(peakfor < 208) = 1.0]) = 3/3 = 1

P(obs peak < 208 given [P(peakfor < 208) = 0.75]) = 0/0 = NA

Reliability Diagram Example Computation

Page 25: Verification Continued…  Holly C. Hartmann

25

Step 6: Plot centroid of the forecast category (just points in our case) on the x-axis against the observed frequency within each forecast category on the y-axis. Include the 45 degree diagonal for reference.

Reliability Diagram Example Computation

Page 26: Verification Continued…  Holly C. Hartmann

26

Step 7: Include sharpness plot showing the number of observation/forecast pairs in each category.

Reliability Diagram Example Computation

Page 27: Verification Continued…  Holly C. Hartmann

27

Good reliability – close to diagonal

Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows marginal distribution of forecasts

Good resolution –wide range of frequency of observations corresponding to forecast probabilities

Skill – related to Brier Skill Score, in reference to sample climatology (not historical climatology)

The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome?

Reliability Diagram – Reliability, Sharpness – P(O|F)

Page 28: Verification Continued…  Holly C. Hartmann

28

Overall relative frequency of observations

(sample climatology)

Points closer to perfect-reliability line than to no-resolution line: subsamples of probabilistic forecast contribute positively to overall skill (as defined by BSS) in reference to sample climatology

No-skill line : halfway between perfect-reliability line and no-resolution line, with sample climatology as a reference

Attributes Diagram – Reliability, Resolution, Skill/No-skill

Page 29: Verification Continued…  Holly C. Hartmann

29

Climatology Minimal RESolution Underforecasting

Good RES, at expense of REL

Reliable forecasts of rare event

Small sample size

Source: Wilks (1995)

Interpretation of Reliability DiagramsInterpretation of Reliability Diagrams

Page 30: Verification Continued…  Holly C. Hartmann

30

Interpretation of Reliability DiagramsInterpretation of Reliability Diagrams

Reliability

P[O|F]

Does the frequency of occurrence match your probability statement?

Identifies conditional biasR

elat

ive

freq

uenc

y of

obs

erva

tions

Forecasted probability

No resolution

Page 31: Verification Continued…  Holly C. Hartmann

31

EVS Reliability Diagram Examples

25th Percentile Observed Flows (low flows)

Sharp forecasts, but low resolution

Arkansas-Red Basin, 24-hr flows, lead time 1-14 days

85th Percentile Observed Flows (high flows)

Good reliability at shorter lead times, long-leads miss high events

From: J. Brown, EVS Manual

Page 32: Verification Continued…  Holly C. Hartmann

32

Historical seasonal water supply outlooks

Colorado River Basin

Morrill, Hartmann, and Bales, 2007

Page 33: Verification Continued…  Holly C. Hartmann

33

Forecast probability

Rela

tive F

requency

of

Obse

rvati

ons

Jan 1

2) These months show best reliability; low resolution limiting reliability

1) Few high prob. fcasts, good reliability between 10-70% probability; reliability improves.

Reliability: Colorado Basin ESP Seasonal Supply OutlooksReliability: Colorado Basin ESP Seasonal Supply Outlooks

Apr 1

Mar 1

Jun 1

Jan 1

Apr 1

LC JM (5 mo. lead)

LC MM (3 mo. lead)

LC AM (2 mo. lead)

UC JJy (7 mo. lead)

UC AJy (4 mo. lead)

UC JnJy (2 mo. lead)

3) Reliability decreases for later forecasts as resolution increases; UC good at extremes.

high 30% mid 40% low 30%

Franz, Hartmann, and Sorooshian, 2003

Page 34: Verification Continued…  Holly C. Hartmann

34

For a specified observation category, what do the forecast distributions look like?

Discrimination – P(F|O)

“When dry conditions happen… What do the forecasts usually look like?

You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!

Page 35: Verification Continued…  Holly C. Hartmann

35

Discrimination – P(F|O)

You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!

Example: NWS CPC Seasonal climate outlooks, sorted into DRY cases (lowest tercile), 1995-2001, all forecasts, all lead-times

Good discrimination! Not much discrimination!

Forecasted Probability

Rel

ativ

e fr

eque

ncy

of

ind

icat

ed f

orec

ast

Climatology

0.00 0.33 1.00

Probability of dry

Probability of wet

Forecasted Probability

Rel

ativ

e fr

eque

ncy

of

ind

icat

ed f

orec

ast

Climatology

0.00 0.33 1.00

Probability of dry

Probability of wet

Page 36: Verification Continued…  Holly C. Hartmann

36

Rela

tive F

requency

of

Fore

cast

sHigh

Mid-

Low

There is some discrimination…

Early forecasts warned “High flows less likely”

Jan 1

Jan-May

When unusually low flows happened…

P(F|Low flows). Low < 30th percentile

Franz, Hartmann, and Sorooshian (2003)

Forecast probability

Discrimination: Lower Colorado ESP Supply Outlooks

Page 37: Verification Continued…  Holly C. Hartmann

37

Rela

tive F

requency

of

Fore

cast

s

Good Discrimination…

Forecasts were saying:

1) high and mid- flows less likely.

2) Low flows more likely

Jan 1

Forecast probability

Apr 1

Jan-May

Apr-May

High

Mid-

Low

There is some discrimination…

Early forecasts warned “High flows less likely”

Discrimination: Lower Colorado ESP Supply Outlooks

When unusually low flows happened…

P(F|Low flows). Low < 30th percentile

Franz, Hartmann, and Sorooshian (2003)

Page 38: Verification Continued…  Holly C. Hartmann

38

Rela

tive F

requency

of

Fore

cast

s

high 30%

mid 40%

low 30%

1)High flows less likely.

2) No discrimination between mid and low flows.3) Both UC and LC show good discrimination for low flows at 2-month lead time.

Jan 1

Forecast probability

Apr 1

Lower Colorado Basin Jan-May (5 mo. lead)

April-May (2 mo. lead)

Jan 1

Jun 1

Upper Colorado Basin Jan-July (7 mo. lead)

June-July (2 mo. lead)

For observed flows in lowest 30% of historic distribution

Discrimination: Colorado Basin ESP Supply Outlooks

Franz, Hartmann, and Sorooshian (2003)

Page 39: Verification Continued…  Holly C. Hartmann

39

Historical seasonal water supply outlooks

Colorado River Basin

Page 40: Verification Continued…  Holly C. Hartmann

40

All observation CDF is plotted and color coded by tercile.

Forecast ensemble members are sorted into 3 groups according to which tercile its associated observation falls into.

The CDF for each group is plotted in the appropriate color. i.e. high is blue.

Discrimination: CDF Perspective

Credit: K. Werner

Page 41: Verification Continued…  Holly C. Hartmann

41

In this case, there is relatively good discrimination since the three conditional forecast CDFs separate themselves.

Discrimination

Credit: K. Werner

Page 42: Verification Continued…  Holly C. Hartmann

42

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 1: Order observations and divide ordered list into categories. Here we will use terciles (≤ 167, 206 ≤ ≤ 245, ≥ 248) .

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Page 43: Verification Continued…  Holly C. Hartmann

43

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 2: Group forecast ensemble members according to OBS tercile.

Low OBS Forecasts:42, 74, 82, 90, 114, 277, 351, 356,98, 170, 204, 205, 94,135, 156, 158

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Page 44: Verification Continued…  Holly C. Hartmann

44

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Mid OBS Forecasts:65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270,

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

Page 45: Verification Continued…  Holly C. Hartmann

45

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Mid OBS Forecasts:65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270, 108, 189, 227, 228

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

Page 46: Verification Continued…  Holly C. Hartmann

46

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Hi OBS Forecasts:82, 192, 295, 300,

142, 291, 349, 356, 59, 175, 244, 250

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

Page 47: Verification Continued…  Holly C. Hartmann

47

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Hi OBS Forecasts:82, 192, 295, 300, 211, 397, 514, 544142, 291, 349, 356, 59, 175, 244, 250

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

Page 48: Verification Continued…  Holly C. Hartmann

48

Discrimination Example Computation

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 3: Plot all-observation CDF color coded by tercile (≤ 167, 206 ≤ ≤ 245, ≥ 248).

Credit: K. Werner

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

Low

Page 49: Verification Continued…  Holly C. Hartmann

49

Step 4: Add forecasts conditioned on observed terciles CDFs to plot.

Low OBS Forecasts:42, 74, 82, 90, 114, 277, 351, 356,98, 170, 204, 205, 94, 135, 156, 158Mid OBS Forecasts:65, 143, 223, 227,69, 169, 229, 236,94, 219, 267, 270,108, 189, 227, 228Hi OBS Forecasts:82, 192, 295, 300,211, 397, 514, 544,142, 291, 349, 356,59, 175, 244, 250

Discrimination Example Computation

Credit: K. Werner

Page 50: Verification Continued…  Holly C. Hartmann

50

Step 5: Discrimination is shown by the degree to which the conditional forecast CDFs are separated from each other.

In this case, high forecasts discriminate better than mid and low forecasts.

Discrimination Example Computation

Credit: K. Werner

Page 51: Verification Continued…  Holly C. Hartmann

51

How well do April – July volume forecasts discriminate when they are made in Jan, Mar, and May?

Poor discrimination in Jan between forecasting high and medium flows. Best discrimination in May.

Discrimination

Credit: K. Werner

Page 52: Verification Continued…  Holly C. Hartmann

52

Another way to look at discrimination using PDF’s in lieu of CDF’s.

The more separation between the PDF’s the better the discrimination.

Discrimination

Credit: K. Werner

Page 53: Verification Continued…  Holly C. Hartmann

53

Deterministic forecasts

• traditional in hydrology

• sub-optimal for decision making

Common perspective

“Deterministic model simulations and probabilistic forecasts … are two entirely different types of products. Direct comparison of probabilistic forecasts with deterministic single valued forecasts is extremely difficult”

Comparing Deterministic & Probabilistic ForecastsComparing Deterministic & Probabilistic Forecasts

- Anonymous

Page 54: Verification Continued…  Holly C. Hartmann

54

How can we compare deterministic and probabilistic forecasts?

Deterministic

Probabilistic

Source: XEFS Design Team, 2007

Option: Use ensemble median with standard metrics – No! x

Page 55: Verification Continued…  Holly C. Hartmann

55From: A. Hamlet, University of Washington

The ensemble mean minimizes error, but doesn’t represent the overall behavior.

“Pretend Determinism”

Page 56: Verification Continued…  Holly C. Hartmann

56

What’s wrong with using ‘deterministic’ metrics?

Metrics using only central tendency of each forecast pdf fail to distinguish between forecasts 1-3, but will identify 4 as inferior.Metrics that reward accuracy but punish spread will rank the forecast skill from 1 to 4.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.5 1 1.5 2 2.5

Obs Value

PDF

3

21

4

From: A. Hamlet, University of Washington

Page 57: Verification Continued…  Holly C. Hartmann

57

How can we compare deterministic and probabilistic forecasts?

Deterministic

Probabilistic

Source: XEFS Design Team, 2007

Option: Use ensemble median with standard metrics – No! x

Page 58: Verification Continued…  Holly C. Hartmann

58

PD

F

Climatologydistribution

Forecast distribution

Tercile boundaries(equal probability) Deterministic

forecast

Jack-knife calibration error =PDF of error distribution

can determine any quantiles

Deterministic vs. Probabilistic Forecasts

Observation

Flow, QApproach used by Morrill, Hartmann, Bales 2007

Page 59: Verification Continued…  Holly C. Hartmann

59

Lab Session -- Group Exercise

Choose a set of forecasts.

Develop strategies for verifying these forecasts from two perspectives:

- Users

- Forecasters during operations

Report back to group.

Repeat for second set of forecasts, if time permits.