Multi-Model Ensembling for Seasonal-to-Interannual Prediction: From Simple to Complex

Multi-Model Ensembling for Seasonal-to-Interannual Prediction:

From Simple to Complex

Lisa Goddard and Simon MasonInternational Research Institute for Climate & Society

The Earth Institute of Columbia University

Benefit of Combining ModelsBenefit of Using Multiple Models

RPSS for 2m Temperature (JFM 1950-1995)

Combining models reduces deficiencies of individual models

Refining:

(1) RAW MODEL PROBABILITIES (simple) Tercile thresholds determined by model history -- Counting

(2) RECALIBRATED PDF PROBABILITIES (less simple)- Contingency table recalibration (CT): categorical probabilitiesdetermined by category of ensemble mean- Uncertainty in forecast PDFs based on ensemble mean MSE

Varying Complexity in Building MM

Combining: (1) POOLED MM ENSEMBLE (simple) Each model weighted equally

(2) PERFORMANCE-BASED MM ENSEMBLE (less simple)- Bayesian: determine optimal weights for AGCMs & climatology

by maximizing likelihood

- Multiple linear regression (MLR): obtain probabilities from prediction error variance using first few moments of ensemble distributions

- Canonical Variate (CV): maximize discrimination between categoriesusing first few moments of ensemble distributions

Varying Complexity in Building MM

Bayesian Model Combination

Climo Fcst“Prior”

GCM Fcst“Evidence”

t=1 2 3 4…

Combine “prior” and “evidence” to produce weighted “posterior” forecast probabilities, by maximizing the likelihood.

…

…

Combining: Based on model performance

The canonical variates are defined to maximize the ratio of the between-category to within-categoryvariance.

Canonical Variate Analysis

Combining: Based on obs. relative to model

NINO3.4

DEMETER DataRetrospective seasonal climate forecasts generated using fully-coupled ocean-atmosphere models are available from the DEMETER project. Models are initialized using ERA40 atmospheric conditions and an ocean-assimilation. Initializations are available for February, May, August, and November, with lead-times of up to 5 months (e.g., for forecasts initialized in February, forecasts are available for February – July). For ECMWF, Meteo-France, and Met Office, forecasts for at least the 43-year period 1959-2001 are available, each with 9 ensemble members. The total number of ensemble forecasts used is 43×4×6=1032.

Equal Weighting

Probabilistic forecasts were obtained by counting the number of ensemble members beyond the outer quartiles, and then averaging across the three models.

The pooled ensemble is thus an equally-weighted combination of predictions uncorrected for model skill (although corrected for model drift).

Reliability is good for all three categories.


Multiple Linear Regression

Conclusions IVirtually no difference in performance between the various methods considered for objectively combining the DEMETER forecasts of NINO3.4 SSTs. None of the procedures improve noticeably upon simple averaging of the uncalibrated probabilities. However, the underlying skill level is very high. Sophisticated combination schemes may be more effective in situations of weaker skill (??). Attempting to combine all the models at once is clearly unsuccessful; the most effective strategy may be to recalibrate the models individually and then to calculate an equally-weighted average.

Terrestrial Climate

Data

AGCMs: Simulations for 1950-2000* CCM3 (NCAR) – 24 runs* ECHAM4.5 (MPI) – 24 runs* ECPC (Scripps) – 10 runs* GFDL AM2p12b – 10 runs* NCEP-MRF9 (NCEP/QDNR) – 10 runs* NSIPP1 (NASA-GSFC) – 9 runs

Observations:2m Air Temperature and Precipitationfrom CRU-UEA (v2.0)

Effect of Probability TreatmentJFM 2m air temperature over land

Effect of Probability Treatment

100 PDF RAW

RAW

REL RELSSREL

100 PDF RAW

RAW

RES RESSSRES

Effect of Combination Method

Effect of Combination MethodRAW Probabilities

100 PDF RAW

RAW

REL RELSSREL

100 PDF RAW

RAW

RES RESSSRES

Effect of Combination MethodPDF Probabilities

100 PDF RAW

RAW

REL RELSSREL

100 PDF RAW

RAW

RES RESSSRES

Conclusions II

• Reliability of N models pooled together, with uncalibrated PDFs, is better than any individual AGCM.

• Gaussian (PDF) recalibration gives some improvement, but Bayesian recalibration gives the greatest benefit.

• Reliability is typically gained at the expense of resolution.

ISSUES

• Number of Models

• Length of training period

• When simple is complex enough?

Effect of # of Models3 vs 6 AGCMS; 45-year training period

(Robertson et al., 2004, MWR)

Different approaches are more similar

with more models.

RPSS for 2m TemperatureBayesian MM from Raw Probs. –

6 models, 45-yr training

Jan-Feb-Mar Jul-Aug-Sep

RPSS for PrecipitationBayesian & Pooled MM from Raw Probs. –

6 models, 45-yr training

Jan-Feb-Mar Jul-Aug-Sep

Reliability Diagrams* several methodsyield similar results overthe United States.

* MMs are remarkablyreliable over the US,even though the accuracyis not high.

CONCLUSIONS III

•MM simulations over the US are remarkably reliable, even if their not terribly accurate.

•Simple pooling of the AGCMs, with uncalibrated probabilities, is equivalent to any of our techniques over the U.S.

• Doesn’t require long history, but largenumber of models (>5?) is desirable.

GRAND CONCLUSIONS

• Overall, we find that recalibrating individual models gives better results than putting models together in complex combination alorithm.

• In comparing different recalibration/combination methods, we find that generally a gain in reliability is countered with a loss in resolution.

• More complicated approaches are not necessarily better. This needs to investigated for different forecast situations (i.e. variables, region, season).

Ranked Probability Skill ScoresTemperature

Jan-Feb-Mar (1950-1995)

Ranked Probability Skill ScoresPrecipitation

Jul-Aug-Sep (1950-1999)

Comparing treatment of probabilityComparing treatment of probability- Even with 6 models, have regions- Even with 6 models, have regionsof large negative RPSSof large negative RPSS Suggests common model errors Suggests common model errors

- Recalibration reduces, but does notRecalibration reduces, but does noteliminate, large errorseliminate, large errors

- Some improvement of positive skillSome improvement of positive skillRecal-RawRecal-Raw

Ranked Probability Skill ScoresPrecipitation

Jul-Aug-Sep (1950-1999)

Comparing combination methods- Performance-based combination eliminates large errors

- More improvement of positive skill

- More cases of negative skill turned topositive skill


A number of statistical techniques involve calculating linear combinations (weighted sums) of variables. The weights are defined to achieve specific objectives:

• PCA – weighted sums maximize variance

• CCA – weighted sums maximize correlation

• CVA – weighted sums maximize discrimination

Canonical Variate AnalysisLet X be a set of centred explanatory variables, with variance-covariance matrix Sxx. Let Y be a set of (non-centred) indicator variables that define the group membership of a set of observations, with cross-products matrix Syy. Solve the eigenproblem:

1 1 2xx xy yy xy r S S S S I a 0,

which is identical to the CCA problem, but with Y as the set of indicator variables rather than continuous values on dependent variables. The loadings, A, maximize the ratio of between-group variance to total variance, represented by the canonical correlation, r.

Effect of Probability Treatment

2 2

1

: ( , )1 ( )

1

i X i X

n

X i X ii

PDF

On

Documents

Multi-Model Ensembling for Seasonal-to-Interannual Prediction: From Simple to Complex