Pertti Nurmi & Laurie Wilson General Guide to Forecast Verification => Exercises

16.4.2005NOMEK - Verification Excercises - OSLO / [email protected]

Pertti Nurmi & Laurie Wilson

General Guide to Forecast Verification=> Exercises <=

Questions

NOMEK - Oslo15.-16.4.2005

0

10

20

30

1980 1985 1990 1995 2000

T max D+2

0.2

0.4

0.6

T mean D+6-10

T mean D+1-5

T2m; ME & MAE; ECMWF & LAMAverage over 30 stations; Winter 2003

-1

0

1

2

3

4

5

6 12 18 24 30 36 42 48 54 60 72 84 96 108 120

MAE_ECMWF

MAE_LAM

ME_ECMWF

ME_LAM

(C)

( hrs )


Exercise 1: Spot Temperature Scatter Plot (a)

Spot temperature scatter plot Exercise

Attached is a scatter plot of Observed vs. Forecast spot temperatures (the axes are unfortunately reverse compared to the lecture notes) and their conditional distributions. The forecasts are for +40 hours for 11 stations, and the total sample size is 701 cases. The numbers on the scatter plot represent the number of occurrences of observed and forecast temperatures for each whole degree Celsius.

Questions on the scatter plot:

1. Is there a bias? Are forecast temperatures generally too high or too low?

2. How many cases involve errors > 10 oC?

3. How are observed temperatures above +10 oC handled? How are observed temperatures below -20 oC handled?

4. Does the technique ever forecast temperatures < - 20 oC?

5. Assume that temperatures < -20 oC and temperatures > + 10 oC represent extreme events for this station in winter.Hatch the false alarm area and the missed event area for these extreme temperatures. (“False alarms” refer to forecastsof extremes that were not observed and “missed events” are occurrences of extremes that were not forecast.)6. If there were no skill in the forecast, what would the graph look like?


Exercise 1: Spot Temperature Scatter Plot (b)


Exercise 2: Categorical Events Contingency TableGale-force winds Contingency Table Exercise

Attached is a contingency table of five months of categorical warnings against gale-force winds, i.e. wind speeds exceeding 14 m/s (left). Compute the specified verification statistics. For reference, corresponding “Finlay” tornado verification statistics are shown (right). Interpret the scores and compare the two.

B = 2.00PC = 0.97

POD = 0.60FAR = 0.70PAG = 0.30

F = 0.03KSS = 0.57

TS = 0.25ETS = 0.24HSS = 0.39OR = 57.43

ORSS = 0.97

B = _____PC = _____

POD = _____FAR = _____PAG = _____

F = _____KSS = _____

TS = _____ETS = _____HSS = _____OR = _____

ORSS = _____

TornadoTornado observed

forecastYes No fc

Yes 30 70 100

No 20 2680 2700

obs 50 2750 2800

GaleGale observed

forecastYes No fc

Yes 15 2 17

No 11 123 134

obs 26 125 151


Exercise 3: Reliability Table (Diagram) (a)Reliability Table Exercise

Part 1

You have two reliability tables. They represent verification of probability forecasts of heavy precipitation( > 10 mm in 12 hr) produced by two different methods. The verification sample is one year of data at 72 stations.

Questions:

1. Interpret the table for technique A. What can be said about probability forecasts over 40%?2. Which technique produces sharper forecasts? How is this indicated on the diagram?

Part 2

You have two reliability tables, one for 0 to 6 hr POP forecasts and the other for 42 to 48 hr POP forecasts (6 hr periods). Forecasts are for 220 stations over a three month period. On each graph there are two dotted lines, representing the forecasts from two different techniques. Technique A is represented by the blue dots and Technique B by the red dots. In the upper left corner, the histograms indicate the number of times each of the 10 probability categories was predicted. Technique A is shown on the histograms by the blue bars and Technique B by the red bars. The frequencies of prediction of each probability category are also indicated by the numbers beside the points on the graphs. The horizontal line is the sample climatological frequency of occurrence of precipitation.

Questions:

1. Comment on the reliability of the two techniques as indicated by both tables. What does a forecast of 85% actually mean at 0 to 6 hr and 42 to 48 hr?2. Which technique is sharper at 0 to 6 hr? at 42 to 48 hr? How do you know?3. The two extra plotted green points represent categorical forecasts of precipitation from a third technique. Comment on the reliability of this method for both forecast periods.4. Which of the two probability forecast techniques produces the better forecasts in your opinion. Why?


Exercise 3: Reliability Table (Diagram) (b) –Part 1

A

B

A

B


Exercise 3: Reliability Table (Diagram) (c) –Part 2


Exercise 4: Signal Detection TheorySignal Detection Theory Exercise

Attached is a graph of Relative Operating Characteristic curves for probability forecasts of wind speed greater than 10 m/s derived from the ECMWF Ensemble Prediction System. Forecasts are for the winter season for c. 250 European stations. Three ROC curves are shown, for 96 hr, 144 hr, and 240 hr forecasts, respectively.

Questions relating to the ROC:

1. Is there discriminating power in the forecasts at any or all projections? Why or why not?

2. Note that two of the curves (144 hr and 240 hr) cross over. What does this mean?

ROCA


Pertti Nurmi & Laurie Wilson

General Guide to Forecast Verification=> Exercises <=

Answers

NOMEK - Oslo15.-16.4.2005

0

10

20

30

1980 1985 1990 1995 2000

T max D+2

0.2

0.4

0.6

T mean D+6-10

T mean D+1-5

T2m; ME & MAE; ECMWF & LAMAverage over 30 stations; Winter 2003

-1

0

1

2

3

4

5

6 12 18 24 30 36 42 48 54 60 72 84 96 108 120

MAE_ECMWF

MAE_LAM

ME_ECMWF

ME_LAM

(C)

( hrs )


Exercise 1: Answers


Exercise 1: Answer (1,2)

Forecasttoo high

( positive bias )

Forecasttoo low

( negative bias )

(1) It is usually possible to see at a glance whether the bulk of points lie above or below the 45 degree line.In this case, there appears to be a slight negative bias (more points above the line). The actual bias is -0.55 oC for this dataset.

(2) There are 46 cases (6.6%) of the forecasts with errors greater than 10 oC.All points lying outside both diagonals drawn 10 degrees away from the 45 degree line count.


Exercise 1: Answer (3)

(3) Observed temperatures above 10 oC are forecast too low on average;temperatures below -20 oC are forecast too high on average.

This example of forecasting extreme occurrences toward the mean is a characteristic of many statistical forecasts. This does not necessarily mean that the technique is incapable of forecasting extreme temperatures.



-25 oC

(4) Yes. The lowest forecast temperature is -29 oC, lower by 3 degrees than the lowest observed temperature.

There are 8 forecast cases of temperatures below -25 oC, only one observed case.



Missed events

Falsealarms

Missed events

Falsealarms

(5) All occurrences of temperature > +10 oCwere missed. All except one of the occurrences < -20 oCwere missed.

All the forecasts of < -20 oC are false alarms except for 2 cases.



(6) Visualize all temperatures forecasts toward the mean. The observation set cannot be changed, so points can be “moved” only along x-axis.A completely unskilled forecast would appear as an array of points with nodiscernible orientation along the 45 degree line. In this case, the orientation would be a vertical line, which may or may not be the same as the observed mean temperature.


Exercise 1: Answers

Answers to spot temperature scatter plot Exercise

1. It is usually possible to see at a glance whether the bulk of the points lie above or below the 45 degree line.In this case, there appears to be a slight negative bias (more points above the line), meaning temperatures areforecast too low on average. When assessing bias graphically, it is necessary to estimate the cumulative distancefrom the line as well as the number of points on each side of the line. The actual bias is -0.55 oC for this dataset.

2. There are 46 cases, or 6.6 % of the forecasts with errors greater than 10 oC.All points lying outside both diagonals drawn 10 degrees away from the 45 degree line count.

3. Observed temperatures above 10 oC are forecast too low on average, and temperatures below -20 oC are forecasttoo high on average. This example of forecasting extreme occurrences toward the mean is a characteristic of manystatistical forecasts. This does not necessarily mean that the technique is incapable of forecasting extreme temperatures.

4. Yes. The lowest forecast temperature is -29 oC, lower by 3 degrees than the lowest observed temperature.In this dataset, there are altogether 8 forecasts of temperatures below -25 oC.

5. See diagram. All occurrences of temperature > +10 oC were missed. All except one of the occurrences < -20 oCwere missed. All the forecasts of < -20 oC are false alarms except for 2 cases.

6. Visualize prediction of all temperatures toward the mean. The observation set cannot be changed, so pointscan be “moved” only horizontally. A completely unskilled forecast would appear as an array of points with nodiscernible orientation along the 45 degree line. In this case, the orientation would be along a vertical line,which may or may not be the same as the observed mean temperature. Lack of skill can also be expressedby saying that the distributions of the forecasts given the observations lie on top of each other;there is no correlation between forecast and observation.


Exercise 2: AnswersCategorical Events Contingency Table

2. 00 = B0.97 = PC0.60 = POD0.70 = FAR0.30 = PAG0.03 = F0.57 = KSS0.25 = TS0.24 = ETS0.39 = HSS57.43 = OR0.97 = ORSS

B = (a+b)/(a+c) = ____PC = (a+d)/n = ____

POD = a/(a+c) = ____FAR = b/(a+b) = ____PAG = a/(a+b) = ____

F = b/(b+d) = ____KSS = POD-F = ____

TS = a/(a+b+c) = ____ETS = (a-ar)/(a+b+c-ar) = ____

HSS = 2(ad-bc)/[(a+c)(c+d)+(a+b)(b+d)] = ____

OR = ad/bc = ____ORSS = (OR-1)/(OR+1) = ____


forecastYes No fc

Yes 30 70 100

No 20 2680 2700

obs 50 2750 2800

GaleGale observed

forecastYes No fc

Yes 15 2 17

No 11 123 134

obs 26 125 151

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n


Exercise 2: AnswersCategorical Events Contingency Table

2. 00 = B0.97 = PC0.60 = POD0.70 = FAR0.30 = PAG0.03 = F0.57 = KSS0.25 = TS0.24 = ETS0.39 = HSS57.43 = OR0.97 = ORSS

B = (a+b)/(a+c) = 0.65PC = (a+d)/n = 0.91

POD = a/(a+c) = 0.58FAR = b/(a+b) = 0.12PAG = a/(a+b) = 0.88

F = b/(b+d) = 0.02KSS = POD-F = 0.56

TS = a/(a+b+c) = 0.54ETS = (a-ar)/(a+b+c-ar) = 0.48

HSS = 2(ad-bc)/[(a+c)(c+d)+(a+b)(b+d)] = 0.65

OR = ad/bc = 83.86ORSS = (OR-1)/(OR+1) = 0.98


forecastYes No fc

Yes 30 70 100

No 20 2680 2700

obs 50 2750 2800

GaleGale observed

forecastYes No fc

Yes 15 2 17

No 11 123 134

obs 26 125 151

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n


Exercise 3: Reliability Diagram – Part 1

A

B



A

B

(1) The reliability curve is nearly horizontal for forecasts over 30%. Literally, this means that it practically does not matter what the forecast probability is if it is greater than 40%. Heavy precipitation will occur about 30% of the time.

(2) Technique A produces sharper forecasts. This is assessed by comparing the frequencies of forecasts in the various probability ranges for the two techniques. The greater the number of forecasts near the extremes of 100% and 0%, the sharper the technique. For example, technique B never attempts a probability forecast above 50%, while technique A forecasts more than 50% 370 times.




Exercise 3: Reliability Diagram – Part 2-1(1) Both techniques are quite reliable at both ranges. There is slight tendency to overforecast low probabilities and underforecast high probabilities at the 0-6 hr range, especially for Technique B. As the level of forecast accuracy drops with increasing forecast projection, the reliability curves tend to move toward horizontal. At 0-6 hr, a forecast of 85% means 86% for Techn. A and 80% for Techn. B. For 42 to 48hr, 85% means 85% (perfectly reliable) for Techn. A and 76% for Techn. B.


Exercise 3: Reliability Diagram – Part 2-2(2) Techn. A is sharper at 0-6 hr. This is indicated by the slight tendency toward a U-shape in the sharpness histogram and greater numbers of forecasts of extreme probabilities. Techn. B is sharper at 42-48 hr. This is characteristic: Techn. A is a MOS technique, which tend to maintain reliability with increasing forecast projection while losing sharpness. Techn. B is perfect prog which typically maintain sharpness, but tend to lose reliability as accuracy decreases.


Exercise 3: Reliability Diagram – Part 2-3(3) The two plotted points on each graph are for the GEM model. Categorical forecasts are not reliable unless they are also perfect. At 0-6 hr, the model achieves a hit rate of only 52% for its precipitation forecasts, and even less, 48% at 42-48 hr.


Exercise 3: Reliability Diagram – Part 2-4(4) It is a matter of preference, and depends on the way in which the forecasts will be used. Sharper techniques may be preferred as a kind of “alert” to possible extreme conditions, even at the cost of some reliability. On the other hand, reliability is preferred for forecast systems which will not be carefully monitored. These results suggest that uncertainty can be quantified reliably using PoP, and that the PoP forecasts convey more information than categorical forecasts.


Exercise 4: Answers Signal Detection Theory Exercise

ROCA

Below is a graph of Relative Operating Characteristic curves for probability forecasts of wind speed greater than 10 m/s derived from the ECMWF EPS. Forecasts are for winter season for c. 250 European stations. Three ROC curves are shown, for 96, 144, and 240 hr forecasts.

Questions relating to the ROC:

1. Is there discriminating power in the forecasts at any or all projections? Why or why not?2. Note that two of the curves (144 hr and 240 hr) cross over. What does this mean?



(1) Yes, at all forecast ranges: the EPS is able to distinguish cases leading to winds over 10 m/s from cases leading to winds under 10 m/s. While this may seem remarkable at first glance, the sample undoubtedly contains many situations where there is little doubt, the winds are well away from the threshold. The results surely would have been different if more demanding forecasts were demanded, e.g. identifying winds in 5 m/s categories. The ROC area is in the 0.8 range, which is high enough to consider the forecasts useful. Although the theoretical lower limit of skill is 0.5, in meteorology the signal gets pretty weak if the area is < 0.7.

ROCA



(2) The curves for 144 hr and 240 hr cross over about a third of the way from the lower left hand corner. This means that the 144 hr forecasts form a better basis for decision-making at lower ranges of probabilities (higher hit rate vs false alarm rate) while the 240 hr forecasts form a slightly better basis for decision-making at higher ranges of probability. The differences are very small and can be considered a random effect of this particular dataset. While cross overs are not a frequent occurrence, they occur sometimes and indicate interesting aspects of performance.

ROCA

Documents

Pertti Nurmi & Laurie Wilson General Guide to Forecast Verification => Exercises