Verification Continued… Holly C. Hartmann

Preview:

DESCRIPTION

Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona hollyoregon@juno.com. RFC Verification Workshop, 08/14/2007. Agenda. Introduction to Verification Applications, Rationale, Basic Concepts Data Visualization and Exploration - PowerPoint PPT Presentation

Citation preview

1

Verification Continued…

Holly C. Hartmann

Department of Hydrology and Water ResourcesUniversity of Arizona

hollyoregon@juno.com

RFC Verification Workshop, 08/14/2007

2

1. Introduction to Verification- Applications, Rationale, Basic Concepts- Data Visualization and Exploration- Deterministic Scalar measures

2. Categorical measures – KEVIN WERNER- Deterministic Forecasts- Ensemble Forecasts

3. Diagnostic Verification- Reliability- Discrimination- Conditioning/Structuring Analyses

4. Lab Session/Group Exercise- Developing Verification Strategies- Connecting to Forecast Operations and Users

Agenda

3

Probabilistic Ensemble Forecasts

From: California-Nevada River Forecast Center

4

Probabilistic Ensemble Forecasts

From: California-Nevada River Forecast Center

5

Probabilistic Ensemble Forecasts

From: A. Hamlet, University of Washington

6

From: A. Hamlet, University of Washington

7

Probabilistic Ensemble Forecasts

From: A. Hamlet, University of Washington

8

• Identifies systematic flaws of an ensemble prediction system.• Shows effectiveness of ensemble distribution in sampling the observations. • Does not indicate that the ensemble will be of practical use.

Talagrand Diagram – Also Called Ranked Histogram

9

With only one ensemble member ( | ) all (100%) observations ( ) will fall “outside”

With two ensemble members two out ofthree observations ( 2/3=67%) should fall outside

With three ensemble members two out of four observations ( 2/4=50%) should fall outside

|

| |

| | | For any number of ensemble members, 2/#members should fall outside the ensemble

• Identifies systematic flaws of an ensemble prediction system.• Shows effectiveness of ensemble distribution in sampling the observations. • Does not indicate that the ensemble will be of practical use.

Principle Behind Talagrand Diagram

Talagrand Diagram – Also Called Ranked Histogram

Adapted from A. Persson, 2006

10

Talagrand Diagram Computation Example

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year)

Step 1: Rank lowest to highest for each year. Four members results in 5 bins.

Step 2: Determine which bin the corresponding observation falls into.

Step 3: Tally how many observations fall in each bin.

Step 4: Plot frequency of observations for ranked bin.

Bin #

5

3

5

4

5

3

4

4

5

Bin1 Bin2 Bin3 Bin4 Bin5

Bin # Tally

1 2 3 4 5

11

Talagrand Diagram Computation Example

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year)

Step 1: Rank lowest to highest for each year. Four members results in 5 bins.

Step 2: Determine which bin the corresponding observation falls into.

Step 3: Tally how many observations fall in each bin.

Step 4: Plot frequency of observations for ranked bin.

Bin #

5

3

5

4

3

1

2

5

3

4

4

5

Bin1 Bin2 Bin3 Bin4 Bin5

Bin # Tally

1 1 2 1 3 3 4 3 5 4

12

Talagrand Diagram

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5

Categor y Defi ned by Or der ed E nsembl e Member s

Talagrand Diagram Computation Example

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year)

Bin #

5

3

5

4

3

1

2

5

3

4

4

5

Bin # Tally

1 1 2 1 3 3 4 3 5 4

Bin1 Bin2 Bin3 Bin4 Bin5

Fre

qu

ency

13

Talagrand Diagram: 25 traces/ensemble, 375 observations

Example: “U-Shaped”Observations too often falling outside ensembleIndicates ensemble spread too small

Example: “L-Shaped”Observations too often larger (smaller) than ensembleIndicates under- (over-) forecasting bias

Example: “N-Shaped” (domed shaped)Observations too rarely falling outside ensembleIndicates ensemble spread is too big

Example: “Flat-Shaped”Observations falling uniformly across ensembleIndicates appropriately sized ensemble distribution

0

5

10

15

20

25

1 3 5 7 9

11

13

15

17

19

21

23

25

Category(defined by ordered ensemble members)

Re

lati

ve

Fre

qu

en

cy

of

An

aly

sis

0

24

6

810

12

14

1618

20

1 3 5 7 9

11

13

15

17

19

21

23

25

Category(defined by ordered ensemble members)

Rel

ativ

e Fr

equ

enc

y o

f Ana

lys

is

0

2

4

6

8

10

12

14

16

18

1 3 5 7 9

11

13

15

17

19

21

23

25

Category(defined by ordered ensemble members)

Re

lati

ve

Fre

qu

en

cy

of

An

aly

sis

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Category(defined by ordered ensemble members)

Re

lati

ve F

req

ue

nc

y o

f A

na

lys

is

14

Talagrand Diagram Example: Interpretation?

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Bin #

5

3

5

4

3

1

2

5

3

4

4

5

Bin # Tally

1 1 2 1 3 3 4 3 5 4

Bin1 Bin2 Bin3 Bin4 Bin5

Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year)

Talagrand Diagram

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5

Categor y Defi ned by Or der ed E nsembl e Member s

???

Fre

qu

ency

15

Distributions-orientedForecast Evaluation

leads toDiagnostic Verification

It’s all about conditional and marginal distributions!

P(O|F), P(F|O), P(F), P(O)

Reliability, Discrimination, Sharpness, Uncertainty

16

Forecast Reliability -- P(O|F)

For a specified forecast condition, what does the distribution of observations look like?

Forecasted Probability

Rel

ativ

e fr

equ

ency

of

obse

rved

0

1

01

Forecasted Probability

Rel

ativ

e fr

equ

ency

of

obse

rved

0

1

01

User perspective: “When you say 20% chance of flood flows, how often do flood flows actually happen?”

User perspective: “When you say 80% chance of flood flows, how often do flood flows actually happen?”

17

Good reliability – close to diagonal

Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows shows marginal distribution of forecasts

The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome?

Reliability (Attributes) Diagram – Reliability, Sharpness

18

Reliability Diagram Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 1: Choose threshold value to base probability forecasts on. For simplicity we’ll choose the mean forecast over all years and all ensembles (= 208).

19

Reliability Diagram Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1)

Step 3: For each forecast, calculate the forecast probability below the threshold value.

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.5

1.0

20

Reliability Diagram Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1)

Step 3: For each forecast, calculate the forecast probability below the threshold value.

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.25

0.5

0.5

1.0

21

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories).

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.25

0.5

0.5

1.0

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/A

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227

Reliability Diagram Example Computation

P(peak < 208) = 1.0

22

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories).

P(peakfor < 208)

1.0

0.5

0.5

0.0

0.25

0.25

1.0

0.5

0.25

0.5

0.5

1.0

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/A

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227

P(peak < 208) = 1.0

112, 156, 167

Reliability Diagram Example Computation

23

Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs.

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/AP(peak < 208) = 1.0

112, 156, 167

P(obs peak < 208 given [P(peakfor < 208) = 0.0]) = 0/1 = 0.0

P(obs peak < 208 given [P(peakfor < 208) = 0.25]) = 1/3 = 0.33

P(obs peak < 208 given [P(peakfor < 208) = 0.5]) = 1/5 = 0.2

P(obs peak < 208 given [P(peakfor < 208) = 1.0]) =

P(obs peak < 208 given [P(peakfor < 208) = 0.75]) = 0/0 = NA

Reliability Diagram Example Computation

24

Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs.

P(peak < 208) = 0.0

516P(peak < 208) = 0.25

348, 98, 233P(peak < 208) = 0.5

206, 301, 245, 248, 227P(peak < 208) = 0.75

N/AP(peak < 208) = 1.0

112, 156, 167

P(obs peak < 208 given [P(peakfor < 208) = 0.0]) = 0/1 = 0.0

P(obs peak < 208 given [P(peakfor < 208) = 0.25]) = 1/3 = 0.33

P(obs peak < 208 given [P(peakfor < 208) = 0.5]) = 1/5 = 0.2

P(obs peak < 208 given [P(peakfor < 208) = 1.0]) = 3/3 = 1

P(obs peak < 208 given [P(peakfor < 208) = 0.75]) = 0/0 = NA

Reliability Diagram Example Computation

25

Step 6: Plot centroid of the forecast category (just points in our case) on the x-axis against the observed frequency within each forecast category on the y-axis. Include the 45 degree diagonal for reference.

Reliability Diagram Example Computation

26

Step 7: Include sharpness plot showing the number of observation/forecast pairs in each category.

Reliability Diagram Example Computation

27

Good reliability – close to diagonal

Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows marginal distribution of forecasts

Good resolution –wide range of frequency of observations corresponding to forecast probabilities

Skill – related to Brier Skill Score, in reference to sample climatology (not historical climatology)

The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome?

Reliability Diagram – Reliability, Sharpness – P(O|F)

28

Overall relative frequency of observations

(sample climatology)

Points closer to perfect-reliability line than to no-resolution line: subsamples of probabilistic forecast contribute positively to overall skill (as defined by BSS) in reference to sample climatology

No-skill line : halfway between perfect-reliability line and no-resolution line, with sample climatology as a reference

Attributes Diagram – Reliability, Resolution, Skill/No-skill

29

Climatology Minimal RESolution Underforecasting

Good RES, at expense of REL

Reliable forecasts of rare event

Small sample size

Source: Wilks (1995)

Interpretation of Reliability DiagramsInterpretation of Reliability Diagrams

30

Interpretation of Reliability DiagramsInterpretation of Reliability Diagrams

Reliability

P[O|F]

Does the frequency of occurrence match your probability statement?

Identifies conditional biasR

elat

ive

freq

uenc

y of

obs

erva

tions

Forecasted probability

No resolution

31

EVS Reliability Diagram Examples

25th Percentile Observed Flows (low flows)

Sharp forecasts, but low resolution

Arkansas-Red Basin, 24-hr flows, lead time 1-14 days

85th Percentile Observed Flows (high flows)

Good reliability at shorter lead times, long-leads miss high events

From: J. Brown, EVS Manual

32

Historical seasonal water supply outlooks

Colorado River Basin

Morrill, Hartmann, and Bales, 2007

33

Forecast probability

Rela

tive F

requency

of

Obse

rvati

ons

Jan 1

2) These months show best reliability; low resolution limiting reliability

1) Few high prob. fcasts, good reliability between 10-70% probability; reliability improves.

Reliability: Colorado Basin ESP Seasonal Supply OutlooksReliability: Colorado Basin ESP Seasonal Supply Outlooks

Apr 1

Mar 1

Jun 1

Jan 1

Apr 1

LC JM (5 mo. lead)

LC MM (3 mo. lead)

LC AM (2 mo. lead)

UC JJy (7 mo. lead)

UC AJy (4 mo. lead)

UC JnJy (2 mo. lead)

3) Reliability decreases for later forecasts as resolution increases; UC good at extremes.

high 30% mid 40% low 30%

Franz, Hartmann, and Sorooshian, 2003

34

For a specified observation category, what do the forecast distributions look like?

Discrimination – P(F|O)

“When dry conditions happen… What do the forecasts usually look like?

You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!

35

Discrimination – P(F|O)

You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!

Example: NWS CPC Seasonal climate outlooks, sorted into DRY cases (lowest tercile), 1995-2001, all forecasts, all lead-times

Good discrimination! Not much discrimination!

Forecasted Probability

Rel

ativ

e fr

eque

ncy

of

ind

icat

ed f

orec

ast

Climatology

0.00 0.33 1.00

Probability of dry

Probability of wet

Forecasted Probability

Rel

ativ

e fr

eque

ncy

of

ind

icat

ed f

orec

ast

Climatology

0.00 0.33 1.00

Probability of dry

Probability of wet

36

Rela

tive F

requency

of

Fore

cast

sHigh

Mid-

Low

There is some discrimination…

Early forecasts warned “High flows less likely”

Jan 1

Jan-May

When unusually low flows happened…

P(F|Low flows). Low < 30th percentile

Franz, Hartmann, and Sorooshian (2003)

Forecast probability

Discrimination: Lower Colorado ESP Supply Outlooks

37

Rela

tive F

requency

of

Fore

cast

s

Good Discrimination…

Forecasts were saying:

1) high and mid- flows less likely.

2) Low flows more likely

Jan 1

Forecast probability

Apr 1

Jan-May

Apr-May

High

Mid-

Low

There is some discrimination…

Early forecasts warned “High flows less likely”

Discrimination: Lower Colorado ESP Supply Outlooks

When unusually low flows happened…

P(F|Low flows). Low < 30th percentile

Franz, Hartmann, and Sorooshian (2003)

38

Rela

tive F

requency

of

Fore

cast

s

high 30%

mid 40%

low 30%

1)High flows less likely.

2) No discrimination between mid and low flows.3) Both UC and LC show good discrimination for low flows at 2-month lead time.

Jan 1

Forecast probability

Apr 1

Lower Colorado Basin Jan-May (5 mo. lead)

April-May (2 mo. lead)

Jan 1

Jun 1

Upper Colorado Basin Jan-July (7 mo. lead)

June-July (2 mo. lead)

For observed flows in lowest 30% of historic distribution

Discrimination: Colorado Basin ESP Supply Outlooks

Franz, Hartmann, and Sorooshian (2003)

39

Historical seasonal water supply outlooks

Colorado River Basin

40

All observation CDF is plotted and color coded by tercile.

Forecast ensemble members are sorted into 3 groups according to which tercile its associated observation falls into.

The CDF for each group is plotted in the appropriate color. i.e. high is blue.

Discrimination: CDF Perspective

Credit: K. Werner

41

In this case, there is relatively good discrimination since the three conditional forecast CDFs separate themselves.

Discrimination

Credit: K. Werner

42

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 1: Order observations and divide ordered list into categories. Here we will use terciles (≤ 167, 206 ≤ ≤ 245, ≥ 248) .

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

43

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 2: Group forecast ensemble members according to OBS tercile.

Low OBS Forecasts:42, 74, 82, 90, 114, 277, 351, 356,98, 170, 204, 205, 94,135, 156, 158

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

44

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Mid OBS Forecasts:65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270,

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

45

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Mid OBS Forecasts:65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270, 108, 189, 227, 228

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

46

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Hi OBS Forecasts:82, 192, 295, 300,

142, 291, 349, 356, 59, 175, 244, 250

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

47

Discrimination Example Computation

YEAR E1 E2 E3 E4

1981 42 74 82 90

1982 65 143 223 227

1983 82 192 295 300

1984 211 397 514 544

1985 142 291 349 356

1986 114 277 351 356

1987 98 170 204 205

1988 69 169 229 236

1989 94 219 267 270

1990 59 175 244 250

1991 108 189 227 228

1992 94 135 156 158

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Hi OBS Forecasts:82, 192, 295, 300, 211, 397, 514, 544142, 291, 349, 356, 59, 175, 244, 250

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

LowCredit: K. Werner

Step 2: Group forecast ensemble members according to OBS tercile.

48

Discrimination Example Computation

OBS

112

206

301

516

348

98

156

245

233

248

227

167

Step 3: Plot all-observation CDF color coded by tercile (≤ 167, 206 ≤ ≤ 245, ≥ 248).

Credit: K. Werner

OBS Tercile

Low

Middle

High

High

High

Low

Low

Middle

Middle

High

Middle

Low

49

Step 4: Add forecasts conditioned on observed terciles CDFs to plot.

Low OBS Forecasts:42, 74, 82, 90, 114, 277, 351, 356,98, 170, 204, 205, 94, 135, 156, 158Mid OBS Forecasts:65, 143, 223, 227,69, 169, 229, 236,94, 219, 267, 270,108, 189, 227, 228Hi OBS Forecasts:82, 192, 295, 300,211, 397, 514, 544,142, 291, 349, 356,59, 175, 244, 250

Discrimination Example Computation

Credit: K. Werner

50

Step 5: Discrimination is shown by the degree to which the conditional forecast CDFs are separated from each other.

In this case, high forecasts discriminate better than mid and low forecasts.

Discrimination Example Computation

Credit: K. Werner

51

How well do April – July volume forecasts discriminate when they are made in Jan, Mar, and May?

Poor discrimination in Jan between forecasting high and medium flows. Best discrimination in May.

Discrimination

Credit: K. Werner

52

Another way to look at discrimination using PDF’s in lieu of CDF’s.

The more separation between the PDF’s the better the discrimination.

Discrimination

Credit: K. Werner

53

Deterministic forecasts

• traditional in hydrology

• sub-optimal for decision making

Common perspective

“Deterministic model simulations and probabilistic forecasts … are two entirely different types of products. Direct comparison of probabilistic forecasts with deterministic single valued forecasts is extremely difficult”

Comparing Deterministic & Probabilistic ForecastsComparing Deterministic & Probabilistic Forecasts

- Anonymous

54

How can we compare deterministic and probabilistic forecasts?

Deterministic

Probabilistic

Source: XEFS Design Team, 2007

Option: Use ensemble median with standard metrics – No! x

55From: A. Hamlet, University of Washington

The ensemble mean minimizes error, but doesn’t represent the overall behavior.

“Pretend Determinism”

56

What’s wrong with using ‘deterministic’ metrics?

Metrics using only central tendency of each forecast pdf fail to distinguish between forecasts 1-3, but will identify 4 as inferior.Metrics that reward accuracy but punish spread will rank the forecast skill from 1 to 4.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.5 1 1.5 2 2.5

Obs Value

PDF

3

21

4

From: A. Hamlet, University of Washington

57

How can we compare deterministic and probabilistic forecasts?

Deterministic

Probabilistic

Source: XEFS Design Team, 2007

Option: Use ensemble median with standard metrics – No! x

58

PD

F

Climatologydistribution

Forecast distribution

Tercile boundaries(equal probability) Deterministic

forecast

Jack-knife calibration error =PDF of error distribution

can determine any quantiles

Deterministic vs. Probabilistic Forecasts

Observation

Flow, QApproach used by Morrill, Hartmann, Bales 2007

59

Lab Session -- Group Exercise

Choose a set of forecasts.

Develop strategies for verifying these forecasts from two perspectives:

- Users

- Forecasters during operations

Report back to group.

Repeat for second set of forecasts, if time permits.