ENSO Model Validation Using Wavelet Probability AnalysisPredicting changes to the El Nino/Southern~ Oscillation (ENSO) has important societal implications, including drought management

Generated using version 3.0 of the official AMS LATEX template

ENSO Model Validation Using Wavelet Probability Analysis

Samantha Stevenson ∗ and Baylor Fox-Kemper

Department of Atmospheric and Oceanic Sciences, University of Colorado, Boulder, CO USA

Markus Jochum

National Center for Atmospheric Research, Boulder, Colorado, USA

Balaji Rajagopalan

Department of Civil, Environmental and Architectural Engineering, University of Colorado, Boulder, CO USA

Stephen G. Yeager

National Center for Atmospheric Research, Boulder, Colorado, USA

∗Corresponding author address: Samantha Stevenson, CIRES, 216 UCB, Boulder, CO 80303

E-mail: [email protected]

1

ABSTRACT

A new method to quantify changes in El Nino/Southern Oscillation (ENSO) variability

is presented, using the overlap between probability distributions of the wavelet spectrum

as measured by the ‘wavelet probability index’ (WPI). Examples are provided using long

integrations of two coupled climate models (CCSM3.5 and CM2.1); when subsets of NINO3.4

time series are compared, the width of the confidence interval on WPI has an exponential

dependence on the length of the subset used, with a statistically identical slope for both

models. This relation may be used to calculate the necessary run length for a given accuracy

in ENSO representation. Applying hypothesis testing techniques to the WPI distributions

from model subsets and from comparisons of model subsets to the historical NINO3.4 index

then provides statistically robust comparisons of relative model agreement; implications for

future model tuning are discussed.

1

1. Introduction

Predicting changes to the El Nino/Southern Oscillation (ENSO) has important societal

implications, including drought management in the American Southwest (Seager 2007; Tren-

berth et al. 1998; Ropelewski and Halpert 1996). However, accurate prediction is limited by

the short extent of observations in the tropical Pacific (Guilyardi et al. 2009); both mod-

eling (Wittenberg 2009) and observational (Meinen and McPhaden 2000; McPhaden 1999;

Zhang and McPhaden 1995) studies agree that modulations in ENSO dynamics occur on

long timescales, meaning that longer records are necessary to capture the full behavior of

the system.Paleoproxies are often used to extend the temporal baseline, but their use may

be complicated by observational effects (McGregor and Gagan 2004; Brown et al. 2008).

Long coupled climate model integrations are presently one of the few remaining options

for studying long-term ENSO variability. Coupled models suffer from some biases (Capotondi

et al. 2006), but the present generation of coupled models shows increased accuracy. In

particular, the updated version of NCAR’s Community Climate System Model (hereafter

CCSM3.5) (Neale et al. 2008) is much improved relative to the IPCC AR4-class climate

models at both fine and coarse resolutions (Jochum et al. 2009a); the T31x3 CCSM3.5 is

therefore relatively inexpensive while still as accurate as any present model.

This paper uses long integrations of the T31x3 CCSM3.5 to illustrate a new, wavelet-

based probabilistic model validation method, capable of dealing with skewed and temporally

variable distributions and useful both for ENSO and for other climate indices. Traditional

tests (χ2 or Kolmogorov-Smirnov) are not suitable for non-Gaussian distributions; however,

wavelet probability analysis can provide quantitative statistical measures even for highly

2

nonnormal distributions of spectral power. This method is extremely versatile: it may be

used to predict the necessary length for a model run (Section a), to quantify agreement

between a model and observations (Section b), or to examine the relative performance

of multiple models compared to observations (Section c).

2. Wavelet Probability Analysis

This method relies on the probability distribution function (PDF) of wavelet power.

Here, NINO3.4 SST from a 1200-year integration of the CCSM3.5, hereafter ‘CCSMcontrol’,

forms the primary dataset. CCSMcontrol is configured as in (Jochum et al. 2009b) and

validated against the monthly gridded SST product of Large and Yeager (2004) (hereafter

the CORE hindcast), covering the period from 1949-2003 and chosen for convenience; other

data products can easily be used as well.

Figure 1 shows the PDF of wavelet power, generated using the wavelet toolkit of Torrence

and Compo (1998). The CORE hindcast lies close to the median for the model run: the

model and data compare well. Some offsets do remain at long periods, most likely due to

errors in CCSM3.5’s representation of ENSO or other decadal variability (i.e. the Pacific

Decadal Oscillation) but with some potential contribution from undersampling the true range

of ENSO dynamics. Wavelet probability analysis allows us to distinguish data/model offsets

from what would be expected due to natural variability.

Let f1(σ, ν) and f2(σ, ν) be two PDFs of wavelet power σ at frequency ν. Then the

joint PDF F (σ, ν) is the probability that a given level of wavelet power is observed in both

datasets at frequency ν, and the integral of F (σ, ν) is the overlap between the two. We refer

3

to the latter quantity as the wavelet probability index, or WPI1:

WPI(ν) =

∫ ∞

0

F (σ, ν)dσ =

∫ ∞

0

f1(σ, ν)f2(σ, ν)dσ (1)

assuming that the two wavelet PDFs f1 and f2 are independent. By definition, WPI lies

between 0 and 1, and measures statistical agreement between time series. WPI can be used

to measure internal variability (“self-overlap”; Section a), or to quantify agreement between

records: for example, model simulations, or a model vs. data (Sections b and c).

The choice of wavelet basis has a minor effect on the results; here, we use the Mexican

hat, or ‘derivative of Gaussian’, wavelet, of degree 2 (Daubechies 1990):

Ψ(η) =−1√Γ(5

2)

d2

dη2(e−η2

2 ) (2)

where η is the nondimensionalized time parameter. We note that the known bias in the

wavelet spectrum (Liu et al. 2007) does not affect the results of later tests.

The relevant steps for this analysis are as follows:

i. Choose the two time series to compare (e.g., subsets of a model vs. entire run, subsets

of a model vs. data).

ii. Create a time series for the region of interest.

iii. Perform a wavelet analysis on the two time series.

iv. Compute the probability distribution function of the wavelet power, for all time series

of interest.

1One can also integrate WPI over frequency to obtain a single value, but this loses useful information.

4

v. Calculate the WPI according to Equation 1.

vi. Subsample the data to find the WPI distribution due to internal variability. Confidence

intervals at the 1 − α significance level may then be obtained using the α2

and 1 − α2

percentiles of the WPI distribution. Alternatively, Tables 2 and 3 may be used where

subsampling is impractical (i.e. for short data records).

Steps 1-6 yield a quantitative measure of spectral agreement between time series, accom-

panied by well-defined significance levels. In this sense, the wavelet probability method is a

natural extension of the qualitative estimates of model/observed ENSO agreement of Neale

et al. (2008).

Three examples of using wavelet probability analysis are presented here using the NINO3.4

wavelet PDF: a self-overlap calculation (Section a), a data/model comparison (Section b),

and a demonstration of the use of hypothesis testing to accept or reject a climate model

based on ENSO variability (Section c) are shown. A suite of Matlab codes developed for

this purpose have been used in all three calculations2.

a. Self-Overlap

Measuring the WPI range between subsamples of a time series yields the expected degree

of self-agreement as a function of time series length, which allows a prediction of the length

needed for a given level of accuracy. The 90% confidence interval is then the distance between

the 5th and 95th percentiles of the resulting WPI distribution (shown for CCSMcontrol in

Figure 2, upper left).

2http://atoc.colorado.edu/˜slsteven/Toolkit.html

5

Subintervals of a time series are by definition drawn from the same distribution. There-

fore, the upper limit of the WPI distribution should approach 1 for long subintervals, a

behavior which is indeed observed in Figure 2. It is also found that the width of the confi-

dence interval on WPI (Wim) and the model subinterval length L are exponentially related

(Figure 2, upper right-hand panel):

lnWim = β0 + β1L (3)

This relation holds across climate models, as demonstrated using a 2000-year integration

of the GFDL CM2.1, a fully coupled GCM similar to CCSMcontrol but with higher resolution

and different physics (Wittenberg et al. 2006; Wittenberg 2009). Due to internal model

physics, the intercept β0 is itself a function of run length; however, the slope of Equation 3

is statistically indistinguishable between CCSM3.5 and GFDL CM2.1. Equation 3 may be

used to predict the necessary run time for a given coupled model, for any desired level of

self-agreement. For example, to sample 90% of the true ENSO variability, find the value of L

in Equation 3 where Wim = 0.1 (ln Wim = −2.3). This is roughly 250 years for both models,

indicating that 250 years is a good baseline for long simulations. As a rule of thumb, if the

self-overlap WPI distribution is too wide relative to Table 2 by a factor of 2, then the model

must run an additional 80 years.

Typical WPI ranges are given in Table 2 for various subinterval lengths; as a general

rule, if measured WPI values for a NINO3.4 time series fall within the tabulated range, then

the model is performing ‘well’ at that significance level. (Note that if the numerical values

of β0 and β1 in Equation 3, as well as the values in Table 2, will change if a different index

6

is used.) β1 from Table 1 can also yield the self-overlap confidence interval for any arbitrary

model length, given β0 from a shorter, calibration run. New versions of coupled climate

models (for example, those currently in development for the IPCC AR5 report) can thus be

validated against long integrations according to their WPI distributions; using hypothesis

testing to find more precise significance levels is discussed in Section c.

b. Validation Against Data

Estimating the expected agreement between distinct time series (for example, a model

and observations) as a function of their lengths is another use of wavelet probability analysis,

which helps prevent ‘overtuning’ models to a short observational record. The method follows

Section a, except that now the WPI values are derived from the entirety of the CORE

hindcast to subintervals of various lengths taken from the model integrations.

Figure 2 (panels c and d) shows model/data agreement for CCSMcontrol and CM2.1:

below 5 years, WPI ranges from 40-80%, and much lower from 8-12 years. CM2.1’s lower

agreement with CORE relative to CCSM is consistent with CM2.1’s known overestimate of

ENSO amplitude (Wittenberg et al. 2006). However, the upper bound of WPI never reaches

1 for either model/data comparison; both models differ from CORE.

Figure 2 shows that for 50-year model subintervals, the CORE/model and model/model

confidence intervals overlap; the models are indistinguishable from the data. In contrast, for

intervals longer than 200 years, self-overlap and model/data WPI confidence intervals do

not overlap; runs (or data records!) longer than 200-300 years are needed to identify real

offsets, a result which will be made more precise in the next section. In general, rather than

7

tuning as closely as possible to observations, tuning the model to lie inside of the range of

acceptable agreement (Table 3) may be most appropriate.

c. Empirical Hypothesis Testing

The power of this method is the ability to specify the significance level at which two

time series disagree, which is done through hypothesis testing on WPI distributions (i.e.

Figure 2). Empirical methods are used, since using traditional hypothesis tests often yields

misleading results. The WPI distributions of Sections a and b can be highly nonnormal (see

Figure 1), and even the nonparametric Kolmogorov-Smirnov (K-S) test cannot necessarily

be relied on, since samples drawn from different distributions cannot be dismissed without

a priori knowledge of the ‘correct’ distribution. Steps are as follows:

a. Determine the type of test to perform: model/model or model/data.

b. Create the appropriate WPI distributions from subsets of the input time series. For

a model/data comparison, model self-overlap (Section a) will be tested against the

model/data WPI distribution (Section b). For a model/model comparison, the two

model/data distributions will be compared.

c(1). To determine whether two distributions differ at significance level α, compute the α2

to 1− α2

confidence intervals on the two WPI distributions. If these intervals overlap,

the distributions are equivalent; otherwise, they differ.

c(2). To determine the level of confidence one may have in differences between distributions,

repeat step c at many values of α.The largest α for which the confidence intervals

8

overlap is then equivalent to the smallest significance level at which the distributions

differ. Where αmax ≤ 0.1 (1 - αmax ≥ 0.9), for example, the null would be rejected

at the 90% level. In the limit of identical distributions, αmax (minimum significance)

approaches 1 (0); when there is no overlap, αmax (minimum significance) approaches 0

(1).

The end result of Steps a-c(1) is a map of locations in parameter space where the two

time series differ at confidence level α. If Step c(2) is used, a map of the confidence level

at which the time series differ results. The effects of changing model parameters may be

immediately seen (e.g., Figure 3). Test cases, where the CORE hindcast is tested against

a version of itself ‘contaminated’ with an AR(1) ‘red noise’ spectrum of varying amplitude

(not pictured), yield reliable results; CORE does not differ from itself by this metric.

Validation is then performed on three model runs: CCSMcontrol, the CM2.1 run dis-

cussed earlier, and an additional 600-year CCSM run using a lower value of the threshold

relative humidity for cloud formation, hereafter ‘RHLOW’. Frequency ‘bleeding’ is prevented

by using model subintervals of the same length as CORE (in this case, 55 years); results

are found in Figure 3 (left) where horizontal lines indicate differences at the 80, 90 and

95% levels. CCSMcontrol agrees relatively well with CORE everywhere except the 6-12 year

band. RHLOW does somewhat better in the 6-12 year band, but does not agree as well with

CORE at longer periods. Both CCSM runs agree more strongly with CORE in the 2-8 year

band than does CM2.1, but all models perform poorly at 8-12 years.

Model/model comparison is then performed for CCSMcontrol/CM2.1 and CCSMcon-

trol/RHLOW (Figure 3, right): CCSMcontrol and CM2.1 differ throughout the 4-10 year

9

band, but only at long (≥ 200 year) subinterval lengths. In contrast, for the CCSMcon-

trol/RHLOW comparison, long-period agreement is generally good, and the areas of dis-

agreement are smaller than for CCSMcontrol/CM2.1. CCSMcontrol and RHLOW disagree

at 2-8 years for subintervals longer than 200 years, and RHLOW shows better general agree-

ment with CORE for shorter periods. CCSMcontrol may therefore be considered less accu-

rate for short-period ENSO. The reverse is true for the 5-8 year band, where CCSMcontrol is

more consistent with CORE. Likewise for CCSMcontrol vs. CM2.1, where CCSM shows bet-

ter overall agreement with data yet the models disagree with one another, this test indicates

that CCSMcontrol does a better job representing ENSO variability.

The above test cases form ‘sanity checks’, in that CCSM runs are closer to one another

than to CM2.1. Also, an ‘intermediate’ comparison case (not pictured) shows intermediate

results: a test run using the dynamic chlorophyll feedback of Jochum (2009) differs from

CCSMcontrol at 85% significance throughout the ENSO band. We expect this method to

usefully quantify true physical differences between models.

3. Conclusions

Wavelet probability analysis is a robust method of measuring agreement between one or

more data sets. Using the PDF of the NINO3.4 wavelet power, CCSM3.5 is seen to agree

well with the ocean hindcast product of Large and Yeager (2004), lending credence to the

use of this model as a baseline for the study of long-term ENSO variability.

Self-agreement depends on the record length; the 90% confidence interval on the self-

overlap WPI distribution narrows exponentially with record length, and in general halves

10

every 80 years. Using a 1,200 year run of the CCSM3.5 and a 2,000 year run of the GFDL

CM2.1, statistically identical regressions are found; this property may be exploited to provide

the expected level of agreement for a model run of arbitrary length. 250 years is typically

sufficient to illustrate 90% of the range of ENSO behavior, and should be viewed as a

minimum length for future ‘long’ baseline simulations.

Tuning shorter model runs is demonstrated using an empirical hypothesis testing pro-

cedure on CCSM and CM2.1, using the ocean hindcast of Large and Yeager (2004) as a

reference. CCSM is more likely to agree with the instrumental record than the GFDL CM2.1

at short periods; however, CCSM and CM2.1 are consistent at periods longer than 12 years.

Differences between CCSM3.5 and CM2.1 at some frequencies are detectable only for model

subintervals longer than 200 years; this is the suggested minimum length for model inter-

comparison studies. More dramatic changes to model parameters lead to more dramatic

inter-model differences, providing evidence that the method is sensitive to the degree of

physical changes.

Wavelet probability analysis is a simple but powerful tool which provides robust statistical

limits on the expected level of agreement between time series of any length, from any source;

this technique should prove to be very useful for the development of future climate models.

Acknowledgments.

SS is supported by the NASA Earth & Space Science Fellowship (NESSF). A. Wittenberg

is gratefully acknowledged for providing the NINO3.4 time series from GFDL CM2.1.

11

REFERENCES

Brown, J., A. W. Tudhope, M. Collins, and H. V. McGregor, 2008: Mid-Holocene ENSO:

Issues in quantitatve model-proxy data comparisons. Paleoceanography, 23, PA3202.

Capotondi, A., A. Wittenberg, and S. Masina, 2006: Spatial and temporal structure of Trop-

ical Pacific interannual variability in 20th century coupled simulations. Ocean Modelling,

15, 274–298.

Daubechies, I., 1990: The wavelet transform, time-frequency localization and signal analysis.

IEEE Trans. Inform. Theory, 36, 961–1004.

Guilyardi, E., A. Wittenberg, A. Fedorov, M. Collins, C. Wang, A. Capotondi, G. Jan van

Oldenborgh, and T. Stockdale, 2009: Understanding el nino in ocean-atmosphere general

circulation models: Progress and challenges. BAMS, 325–340.

Jochum, M., 2009: Impact of latitudinal variations in vertical diffusivity on climate simula-

tions. Journal of Geophysical Research - Oceans, 114, C01 010.

Jochum, M., B. Fox-Kemper, P. Molnar, and C. Shields, 2009a: Differences in the Indonesian

seaway in a coupled climate model and their relevance to Pliocene climate and El Nino.

Paleoceanography, 24, PA1212.

Jochum, M., S. Yeager, K. Lindsay, K. Moore, and R. Murtugudde, 2009b: Quantification

of the feedback between phytoplankton and ENSO in the Community Climate System

Model. Journal of Climate.

12

Large, W. G. and S. G. Yeager, 2004: Diurnal to decadal global forcing for ocean and

sea-ice models: the data sets and flux climatologies. NCAR Technical Note, NCAR/TN–

460/STR.105.

Liu, Y., X. S. Lian, and R. H. Weisberg, 2007: Rectification of the bias in the wavelet power

spectrum. Journal of Atmospheric and Oceanic Technology, 24, 2093–2102.

McGregor, H. V. and M. K. Gagan, 2004: Western Pacific coral δ18O records of anomalous

Holocene variability in the El Nino-Southern Oscillation. Geophysical Research Letters,

31, L11 204.

McPhaden, M. J., 1999: Genesis and evolution of the 1997-98 El Nino. Science, 283, 950–

954.

Meinen, C. S. and M. J. McPhaden, 2000: Observations of warm water volume changes

in the equatorial Pacific and their relationship to El Nino and La Nina. J. Clim., 13,

3551–3559.

Neale, R. B., J. H. Richter, and M. Jochum, 2008: The impact of convection on ENSO:

From a delayed oscillator to a series of events. Journal of Climate, submitted.

Ropelewski, C. F. and M. S. Halpert, 1996: Quantifying Southern Oscillation-precipitation

relationships. Journal of Climate, 9, 1043 1059.

Seager, R., 2007: The turn of the century north American drought: Global context, dynam-

ics, and past analogs. Journal of Climate, 20, 5527–5552.

13

Torrence, C. and G. Compo, 1998: A practical guide to wavelet analysis. Bull. Amer. Meteor.

Soc., 79, 61–78.

Trenberth, K. E., G. W. Branstator, D. Karoly, A. Kumar, N.-C. Lau, and C. Ropelewski,

1998: Progress during TOGA in understanding and modeling global teleconnections as-

sociated with tropical sea surface temperatures. Journal of Geophysical Research, 103,

14,291–14,324.

Wittenberg, A. T., 2009: Are historical records sufficient to constrain ENSO simulations?

Geophysical Research Letters, 36, L12 702.

Wittenberg, A. T., A. Rosati, N.-C. Lau, and J. J. Ploshay, 2006: Gfdls cm2 global coupled

climate models. part iii: Tropical pacific climate and enso. Journal of Climate, 19, 698–

722.

Zhang, X. and M. J. McPhaden, 1995: Wind stress variations and interannual sea surface

temperature anomalies in the eastern equatorial pacific. Journal of Climate, 19, 226–241.

14

List of Tables

1 Dependence of the 90% WPI confidence interval width on model subinterval

length, from confidence intervals averaged over the 2-6 year band. ∆β0 and

∆β1 refer to the bounds of the 90% confidence intervals on those coefficients. 16

2 WPI values for CCSMcontrol self-overlap calculation as a function of subin-

terval length L at a variety of confidence levels, averaged over the 2-6 year

band. 17

3 WPI values for CCSMcontrol model/data calculation at a variety of confidence

levels, averaged over the 2-6 year band. 18

15

Table 1. Dependence of the 90% WPI confidence interval width on model subintervallength, from confidence intervals averaged over the 2-6 year band. ∆β0 and ∆β1 refer to thebounds of the 90% confidence intervals on those coefficients.

Run β0 β1 ∆β0 ∆β1

CCSMcontrol -0.553 -0.0078 -0.920 - -0.185 -0.0091 - -0.0065GFDL CM2.1 -0.237 -0.0098 -0.458 - -0.015 -0.011 - -0.0090

16

Table 2. WPI values for CCSMcontrol self-overlap calculation as a function of subintervallength L at a variety of confidence levels, averaged over the 2-6 year band.

L 2.5% 5% 10% 90% 95% 97.5%50 0.5007 0.5098 0.5229 0.6296 0.6393 0.6478100 0.6604 0.6646 0.6739 0.7433 0.7466 0.7484200 0.8147 0.8156 0.8172 0.8384 0.8402 0.8415400 0.9201 0.9203 0.9208 0.9262 0.9265 0.9269

17

Table 3. WPI values for CCSMcontrol model/data calculation at a variety of confidencelevels, averaged over the 2-6 year band.

Run length (yrs) 2.5% 5% 10% 90% 95% 97.5%50 0.607 0.647 0.709 0.953 0.965 0.972100 0.751 0.778 0.805 0.956 0.966 0.973200 0.824 0.834 0.848 0.953 0.961 0.964400 0.865 0.869 0.875 0.933 0.936 0.938

18

List of Figures

1 Probability distribution functions for mean NINO3.4 wavelet power. The

white line represents the median value for the model run, while the gray

line is the mean value generated using the CORE hindcast. Dashed black

lines correspond to the 25th and 75th percentile values for the model run

(interquartile range). 20

2 Left-hand panels (a,b): 90% confidence interval on WPI distributions for self-

overlap calculations (a = CCSMcontrol, b = CM2.1). Center panels (c,d):

same as left-hand panels, for model/data WPI distributions. Right-hand pan-

els (e,f) show regression of 90% confidence interval widths against subinterval

length, for self-overlap calculations (top) and data/model comparisons (bot-

tom). In panels e and f, CCSMcontrol data appears as red X’s, CM2.1 as blue

squares. 21

3 Results of hypothesis testing procedure. Left panels: validation of CCSM-

control (top), RHLOW (middle) and CM2.1 (bottom) against the CORE

hindcast. Right panels: comparison of CCSMcontrol vs. CM2.1 (top) and

CCSMcontrol vs. RHLOW (bottom). In all panels, confidence levels plotted

range from 0 (agreement) to 1 (disagreement). 22

19

Fig. 1. Probability distribution functions for mean NINO3.4 wavelet power. The whiteline represents the median value for the model run, while the gray line is the mean valuegenerated using the CORE hindcast. Dashed black lines correspond to the 25th and 75thpercentile values for the model run (interquartile range).

20

Fig. 2. Left-hand panels (a,b): 90% confidence interval on WPI distributions for self-overlapcalculations (a = CCSMcontrol, b = CM2.1). Center panels (c,d): same as left-hand panels,for model/data WPI distributions. Right-hand panels (e,f) show regression of 90% confidenceinterval widths against subinterval length, for self-overlap calculations (top) and data/modelcomparisons (bottom). In panels e and f, CCSMcontrol data appears as red X’s, CM2.1 asblue squares.

21

Fig. 3. Results of hypothesis testing procedure. Left panels: validation of CCSMcontrol(top), RHLOW (middle) and CM2.1 (bottom) against the CORE hindcast. Right panels:comparison of CCSMcontrol vs. CM2.1 (top) and CCSMcontrol vs. RHLOW (bottom). Inall panels, confidence levels plotted range from 0 (agreement) to 1 (disagreement).

22

Documents

ENSO Model Validation Using Wavelet Probability AnalysisPredicting changes to the El Nino/Southern~ Oscillation (ENSO) has important societal implications, including drought management