Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Generated using version 3.0 of the official AMS LATEX template
ENSO Model Validation Using Wavelet Probability Analysis
Samantha Stevenson ∗ and Baylor Fox-Kemper
Department of Atmospheric and Oceanic Sciences, University of Colorado, Boulder, CO USA
Markus Jochum
National Center for Atmospheric Research, Boulder, Colorado, USA
Balaji Rajagopalan
Department of Civil, Environmental and Architectural Engineering, University of Colorado, Boulder, CO USA
Stephen G. Yeager
National Center for Atmospheric Research, Boulder, Colorado, USA
∗Corresponding author address: Samantha Stevenson, CIRES, 216 UCB, Boulder, CO 80303
E-mail: [email protected]
1
ABSTRACT
A new method to quantify changes in El Nino/Southern Oscillation (ENSO) variability
is presented, using the overlap between probability distributions of the wavelet spectrum
as measured by the ‘wavelet probability index’ (WPI). Examples are provided using long
integrations of two coupled climate models (CCSM3.5 and CM2.1); when subsets of NINO3.4
time series are compared, the width of the confidence interval on WPI has an exponential
dependence on the length of the subset used, with a statistically identical slope for both
models. This relation may be used to calculate the necessary run length for a given accuracy
in ENSO representation. Applying hypothesis testing techniques to the WPI distributions
from model subsets and from comparisons of model subsets to the historical NINO3.4 index
then provides statistically robust comparisons of relative model agreement; implications for
future model tuning are discussed.
1
1. Introduction
Predicting changes to the El Nino/Southern Oscillation (ENSO) has important societal
implications, including drought management in the American Southwest (Seager 2007; Tren-
berth et al. 1998; Ropelewski and Halpert 1996). However, accurate prediction is limited by
the short extent of observations in the tropical Pacific (Guilyardi et al. 2009); both mod-
eling (Wittenberg 2009) and observational (Meinen and McPhaden 2000; McPhaden 1999;
Zhang and McPhaden 1995) studies agree that modulations in ENSO dynamics occur on
long timescales, meaning that longer records are necessary to capture the full behavior of
the system.Paleoproxies are often used to extend the temporal baseline, but their use may
be complicated by observational effects (McGregor and Gagan 2004; Brown et al. 2008).
Long coupled climate model integrations are presently one of the few remaining options
for studying long-term ENSO variability. Coupled models suffer from some biases (Capotondi
et al. 2006), but the present generation of coupled models shows increased accuracy. In
particular, the updated version of NCAR’s Community Climate System Model (hereafter
CCSM3.5) (Neale et al. 2008) is much improved relative to the IPCC AR4-class climate
models at both fine and coarse resolutions (Jochum et al. 2009a); the T31x3 CCSM3.5 is
therefore relatively inexpensive while still as accurate as any present model.
This paper uses long integrations of the T31x3 CCSM3.5 to illustrate a new, wavelet-
based probabilistic model validation method, capable of dealing with skewed and temporally
variable distributions and useful both for ENSO and for other climate indices. Traditional
tests (χ2 or Kolmogorov-Smirnov) are not suitable for non-Gaussian distributions; however,
wavelet probability analysis can provide quantitative statistical measures even for highly
2
nonnormal distributions of spectral power. This method is extremely versatile: it may be
used to predict the necessary length for a model run (Section a), to quantify agreement
between a model and observations (Section b), or to examine the relative performance
of multiple models compared to observations (Section c).
2. Wavelet Probability Analysis
This method relies on the probability distribution function (PDF) of wavelet power.
Here, NINO3.4 SST from a 1200-year integration of the CCSM3.5, hereafter ‘CCSMcontrol’,
forms the primary dataset. CCSMcontrol is configured as in (Jochum et al. 2009b) and
validated against the monthly gridded SST product of Large and Yeager (2004) (hereafter
the CORE hindcast), covering the period from 1949-2003 and chosen for convenience; other
data products can easily be used as well.
Figure 1 shows the PDF of wavelet power, generated using the wavelet toolkit of Torrence
and Compo (1998). The CORE hindcast lies close to the median for the model run: the
model and data compare well. Some offsets do remain at long periods, most likely due to
errors in CCSM3.5’s representation of ENSO or other decadal variability (i.e. the Pacific
Decadal Oscillation) but with some potential contribution from undersampling the true range
of ENSO dynamics. Wavelet probability analysis allows us to distinguish data/model offsets
from what would be expected due to natural variability.
Let f1(σ, ν) and f2(σ, ν) be two PDFs of wavelet power σ at frequency ν. Then the
joint PDF F (σ, ν) is the probability that a given level of wavelet power is observed in both
datasets at frequency ν, and the integral of F (σ, ν) is the overlap between the two. We refer
3
to the latter quantity as the wavelet probability index, or WPI1:
WPI(ν) =
∫ ∞
0
F (σ, ν)dσ =
∫ ∞
0
f1(σ, ν)f2(σ, ν)dσ (1)
assuming that the two wavelet PDFs f1 and f2 are independent. By definition, WPI lies
between 0 and 1, and measures statistical agreement between time series. WPI can be used
to measure internal variability (“self-overlap”; Section a), or to quantify agreement between
records: for example, model simulations, or a model vs. data (Sections b and c).
The choice of wavelet basis has a minor effect on the results; here, we use the Mexican
hat, or ‘derivative of Gaussian’, wavelet, of degree 2 (Daubechies 1990):
Ψ(η) =−1√Γ(5
2)
d2
dη2(e−η2
2 ) (2)
where η is the nondimensionalized time parameter. We note that the known bias in the
wavelet spectrum (Liu et al. 2007) does not affect the results of later tests.
The relevant steps for this analysis are as follows:
i. Choose the two time series to compare (e.g., subsets of a model vs. entire run, subsets
of a model vs. data).
ii. Create a time series for the region of interest.
iii. Perform a wavelet analysis on the two time series.
iv. Compute the probability distribution function of the wavelet power, for all time series
of interest.
1One can also integrate WPI over frequency to obtain a single value, but this loses useful information.
4
v. Calculate the WPI according to Equation 1.
vi. Subsample the data to find the WPI distribution due to internal variability. Confidence
intervals at the 1 − α significance level may then be obtained using the α2
and 1 − α2
percentiles of the WPI distribution. Alternatively, Tables 2 and 3 may be used where
subsampling is impractical (i.e. for short data records).
Steps 1-6 yield a quantitative measure of spectral agreement between time series, accom-
panied by well-defined significance levels. In this sense, the wavelet probability method is a
natural extension of the qualitative estimates of model/observed ENSO agreement of Neale
et al. (2008).
Three examples of using wavelet probability analysis are presented here using the NINO3.4
wavelet PDF: a self-overlap calculation (Section a), a data/model comparison (Section b),
and a demonstration of the use of hypothesis testing to accept or reject a climate model
based on ENSO variability (Section c) are shown. A suite of Matlab codes developed for
this purpose have been used in all three calculations2.
a. Self-Overlap
Measuring the WPI range between subsamples of a time series yields the expected degree
of self-agreement as a function of time series length, which allows a prediction of the length
needed for a given level of accuracy. The 90% confidence interval is then the distance between
the 5th and 95th percentiles of the resulting WPI distribution (shown for CCSMcontrol in
Figure 2, upper left).
2http://atoc.colorado.edu/˜slsteven/Toolkit.html
5
Subintervals of a time series are by definition drawn from the same distribution. There-
fore, the upper limit of the WPI distribution should approach 1 for long subintervals, a
behavior which is indeed observed in Figure 2. It is also found that the width of the confi-
dence interval on WPI (Wim) and the model subinterval length L are exponentially related
(Figure 2, upper right-hand panel):
lnWim = β0 + β1L (3)
This relation holds across climate models, as demonstrated using a 2000-year integration
of the GFDL CM2.1, a fully coupled GCM similar to CCSMcontrol but with higher resolution
and different physics (Wittenberg et al. 2006; Wittenberg 2009). Due to internal model
physics, the intercept β0 is itself a function of run length; however, the slope of Equation 3
is statistically indistinguishable between CCSM3.5 and GFDL CM2.1. Equation 3 may be
used to predict the necessary run time for a given coupled model, for any desired level of
self-agreement. For example, to sample 90% of the true ENSO variability, find the value of L
in Equation 3 where Wim = 0.1 (ln Wim = −2.3). This is roughly 250 years for both models,
indicating that 250 years is a good baseline for long simulations. As a rule of thumb, if the
self-overlap WPI distribution is too wide relative to Table 2 by a factor of 2, then the model
must run an additional 80 years.
Typical WPI ranges are given in Table 2 for various subinterval lengths; as a general
rule, if measured WPI values for a NINO3.4 time series fall within the tabulated range, then
the model is performing ‘well’ at that significance level. (Note that if the numerical values
of β0 and β1 in Equation 3, as well as the values in Table 2, will change if a different index
6
is used.) β1 from Table 1 can also yield the self-overlap confidence interval for any arbitrary
model length, given β0 from a shorter, calibration run. New versions of coupled climate
models (for example, those currently in development for the IPCC AR5 report) can thus be
validated against long integrations according to their WPI distributions; using hypothesis
testing to find more precise significance levels is discussed in Section c.
b. Validation Against Data
Estimating the expected agreement between distinct time series (for example, a model
and observations) as a function of their lengths is another use of wavelet probability analysis,
which helps prevent ‘overtuning’ models to a short observational record. The method follows
Section a, except that now the WPI values are derived from the entirety of the CORE
hindcast to subintervals of various lengths taken from the model integrations.
Figure 2 (panels c and d) shows model/data agreement for CCSMcontrol and CM2.1:
below 5 years, WPI ranges from 40-80%, and much lower from 8-12 years. CM2.1’s lower
agreement with CORE relative to CCSM is consistent with CM2.1’s known overestimate of
ENSO amplitude (Wittenberg et al. 2006). However, the upper bound of WPI never reaches
1 for either model/data comparison; both models differ from CORE.
Figure 2 shows that for 50-year model subintervals, the CORE/model and model/model
confidence intervals overlap; the models are indistinguishable from the data. In contrast, for
intervals longer than 200 years, self-overlap and model/data WPI confidence intervals do
not overlap; runs (or data records!) longer than 200-300 years are needed to identify real
offsets, a result which will be made more precise in the next section. In general, rather than
7
tuning as closely as possible to observations, tuning the model to lie inside of the range of
acceptable agreement (Table 3) may be most appropriate.
c. Empirical Hypothesis Testing
The power of this method is the ability to specify the significance level at which two
time series disagree, which is done through hypothesis testing on WPI distributions (i.e.
Figure 2). Empirical methods are used, since using traditional hypothesis tests often yields
misleading results. The WPI distributions of Sections a and b can be highly nonnormal (see
Figure 1), and even the nonparametric Kolmogorov-Smirnov (K-S) test cannot necessarily
be relied on, since samples drawn from different distributions cannot be dismissed without
a priori knowledge of the ‘correct’ distribution. Steps are as follows:
a. Determine the type of test to perform: model/model or model/data.
b. Create the appropriate WPI distributions from subsets of the input time series. For
a model/data comparison, model self-overlap (Section a) will be tested against the
model/data WPI distribution (Section b). For a model/model comparison, the two
model/data distributions will be compared.
c(1). To determine whether two distributions differ at significance level α, compute the α2
to 1− α2
confidence intervals on the two WPI distributions. If these intervals overlap,
the distributions are equivalent; otherwise, they differ.
c(2). To determine the level of confidence one may have in differences between distributions,
repeat step c at many values of α.The largest α for which the confidence intervals
8
overlap is then equivalent to the smallest significance level at which the distributions
differ. Where αmax ≤ 0.1 (1 - αmax ≥ 0.9), for example, the null would be rejected
at the 90% level. In the limit of identical distributions, αmax (minimum significance)
approaches 1 (0); when there is no overlap, αmax (minimum significance) approaches 0
(1).
The end result of Steps a-c(1) is a map of locations in parameter space where the two
time series differ at confidence level α. If Step c(2) is used, a map of the confidence level
at which the time series differ results. The effects of changing model parameters may be
immediately seen (e.g., Figure 3). Test cases, where the CORE hindcast is tested against
a version of itself ‘contaminated’ with an AR(1) ‘red noise’ spectrum of varying amplitude
(not pictured), yield reliable results; CORE does not differ from itself by this metric.
Validation is then performed on three model runs: CCSMcontrol, the CM2.1 run dis-
cussed earlier, and an additional 600-year CCSM run using a lower value of the threshold
relative humidity for cloud formation, hereafter ‘RHLOW’. Frequency ‘bleeding’ is prevented
by using model subintervals of the same length as CORE (in this case, 55 years); results
are found in Figure 3 (left) where horizontal lines indicate differences at the 80, 90 and
95% levels. CCSMcontrol agrees relatively well with CORE everywhere except the 6-12 year
band. RHLOW does somewhat better in the 6-12 year band, but does not agree as well with
CORE at longer periods. Both CCSM runs agree more strongly with CORE in the 2-8 year
band than does CM2.1, but all models perform poorly at 8-12 years.
Model/model comparison is then performed for CCSMcontrol/CM2.1 and CCSMcon-
trol/RHLOW (Figure 3, right): CCSMcontrol and CM2.1 differ throughout the 4-10 year
9
band, but only at long (≥ 200 year) subinterval lengths. In contrast, for the CCSMcon-
trol/RHLOW comparison, long-period agreement is generally good, and the areas of dis-
agreement are smaller than for CCSMcontrol/CM2.1. CCSMcontrol and RHLOW disagree
at 2-8 years for subintervals longer than 200 years, and RHLOW shows better general agree-
ment with CORE for shorter periods. CCSMcontrol may therefore be considered less accu-
rate for short-period ENSO. The reverse is true for the 5-8 year band, where CCSMcontrol is
more consistent with CORE. Likewise for CCSMcontrol vs. CM2.1, where CCSM shows bet-
ter overall agreement with data yet the models disagree with one another, this test indicates
that CCSMcontrol does a better job representing ENSO variability.
The above test cases form ‘sanity checks’, in that CCSM runs are closer to one another
than to CM2.1. Also, an ‘intermediate’ comparison case (not pictured) shows intermediate
results: a test run using the dynamic chlorophyll feedback of Jochum (2009) differs from
CCSMcontrol at 85% significance throughout the ENSO band. We expect this method to
usefully quantify true physical differences between models.
3. Conclusions
Wavelet probability analysis is a robust method of measuring agreement between one or
more data sets. Using the PDF of the NINO3.4 wavelet power, CCSM3.5 is seen to agree
well with the ocean hindcast product of Large and Yeager (2004), lending credence to the
use of this model as a baseline for the study of long-term ENSO variability.
Self-agreement depends on the record length; the 90% confidence interval on the self-
overlap WPI distribution narrows exponentially with record length, and in general halves
10
every 80 years. Using a 1,200 year run of the CCSM3.5 and a 2,000 year run of the GFDL
CM2.1, statistically identical regressions are found; this property may be exploited to provide
the expected level of agreement for a model run of arbitrary length. 250 years is typically
sufficient to illustrate 90% of the range of ENSO behavior, and should be viewed as a
minimum length for future ‘long’ baseline simulations.
Tuning shorter model runs is demonstrated using an empirical hypothesis testing pro-
cedure on CCSM and CM2.1, using the ocean hindcast of Large and Yeager (2004) as a
reference. CCSM is more likely to agree with the instrumental record than the GFDL CM2.1
at short periods; however, CCSM and CM2.1 are consistent at periods longer than 12 years.
Differences between CCSM3.5 and CM2.1 at some frequencies are detectable only for model
subintervals longer than 200 years; this is the suggested minimum length for model inter-
comparison studies. More dramatic changes to model parameters lead to more dramatic
inter-model differences, providing evidence that the method is sensitive to the degree of
physical changes.
Wavelet probability analysis is a simple but powerful tool which provides robust statistical
limits on the expected level of agreement between time series of any length, from any source;
this technique should prove to be very useful for the development of future climate models.
Acknowledgments.
SS is supported by the NASA Earth & Space Science Fellowship (NESSF). A. Wittenberg
is gratefully acknowledged for providing the NINO3.4 time series from GFDL CM2.1.
11
REFERENCES
Brown, J., A. W. Tudhope, M. Collins, and H. V. McGregor, 2008: Mid-Holocene ENSO:
Issues in quantitatve model-proxy data comparisons. Paleoceanography, 23, PA3202.
Capotondi, A., A. Wittenberg, and S. Masina, 2006: Spatial and temporal structure of Trop-
ical Pacific interannual variability in 20th century coupled simulations. Ocean Modelling,
15, 274–298.
Daubechies, I., 1990: The wavelet transform, time-frequency localization and signal analysis.
IEEE Trans. Inform. Theory, 36, 961–1004.
Guilyardi, E., A. Wittenberg, A. Fedorov, M. Collins, C. Wang, A. Capotondi, G. Jan van
Oldenborgh, and T. Stockdale, 2009: Understanding el nino in ocean-atmosphere general
circulation models: Progress and challenges. BAMS, 325–340.
Jochum, M., 2009: Impact of latitudinal variations in vertical diffusivity on climate simula-
tions. Journal of Geophysical Research - Oceans, 114, C01 010.
Jochum, M., B. Fox-Kemper, P. Molnar, and C. Shields, 2009a: Differences in the Indonesian
seaway in a coupled climate model and their relevance to Pliocene climate and El Nino.
Paleoceanography, 24, PA1212.
Jochum, M., S. Yeager, K. Lindsay, K. Moore, and R. Murtugudde, 2009b: Quantification
of the feedback between phytoplankton and ENSO in the Community Climate System
Model. Journal of Climate.
12
Large, W. G. and S. G. Yeager, 2004: Diurnal to decadal global forcing for ocean and
sea-ice models: the data sets and flux climatologies. NCAR Technical Note, NCAR/TN–
460/STR.105.
Liu, Y., X. S. Lian, and R. H. Weisberg, 2007: Rectification of the bias in the wavelet power
spectrum. Journal of Atmospheric and Oceanic Technology, 24, 2093–2102.
McGregor, H. V. and M. K. Gagan, 2004: Western Pacific coral δ18O records of anomalous
Holocene variability in the El Nino-Southern Oscillation. Geophysical Research Letters,
31, L11 204.
McPhaden, M. J., 1999: Genesis and evolution of the 1997-98 El Nino. Science, 283, 950–
954.
Meinen, C. S. and M. J. McPhaden, 2000: Observations of warm water volume changes
in the equatorial Pacific and their relationship to El Nino and La Nina. J. Clim., 13,
3551–3559.
Neale, R. B., J. H. Richter, and M. Jochum, 2008: The impact of convection on ENSO:
From a delayed oscillator to a series of events. Journal of Climate, submitted.
Ropelewski, C. F. and M. S. Halpert, 1996: Quantifying Southern Oscillation-precipitation
relationships. Journal of Climate, 9, 1043 1059.
Seager, R., 2007: The turn of the century north American drought: Global context, dynam-
ics, and past analogs. Journal of Climate, 20, 5527–5552.
13
Torrence, C. and G. Compo, 1998: A practical guide to wavelet analysis. Bull. Amer. Meteor.
Soc., 79, 61–78.
Trenberth, K. E., G. W. Branstator, D. Karoly, A. Kumar, N.-C. Lau, and C. Ropelewski,
1998: Progress during TOGA in understanding and modeling global teleconnections as-
sociated with tropical sea surface temperatures. Journal of Geophysical Research, 103,
14,291–14,324.
Wittenberg, A. T., 2009: Are historical records sufficient to constrain ENSO simulations?
Geophysical Research Letters, 36, L12 702.
Wittenberg, A. T., A. Rosati, N.-C. Lau, and J. J. Ploshay, 2006: Gfdls cm2 global coupled
climate models. part iii: Tropical pacific climate and enso. Journal of Climate, 19, 698–
722.
Zhang, X. and M. J. McPhaden, 1995: Wind stress variations and interannual sea surface
temperature anomalies in the eastern equatorial pacific. Journal of Climate, 19, 226–241.
14
List of Tables
1 Dependence of the 90% WPI confidence interval width on model subinterval
length, from confidence intervals averaged over the 2-6 year band. ∆β0 and
∆β1 refer to the bounds of the 90% confidence intervals on those coefficients. 16
2 WPI values for CCSMcontrol self-overlap calculation as a function of subin-
terval length L at a variety of confidence levels, averaged over the 2-6 year
band. 17
3 WPI values for CCSMcontrol model/data calculation at a variety of confidence
levels, averaged over the 2-6 year band. 18
15
Table 1. Dependence of the 90% WPI confidence interval width on model subintervallength, from confidence intervals averaged over the 2-6 year band. ∆β0 and ∆β1 refer to thebounds of the 90% confidence intervals on those coefficients.
Run β0 β1 ∆β0 ∆β1
CCSMcontrol -0.553 -0.0078 -0.920 - -0.185 -0.0091 - -0.0065GFDL CM2.1 -0.237 -0.0098 -0.458 - -0.015 -0.011 - -0.0090
16
Table 2. WPI values for CCSMcontrol self-overlap calculation as a function of subintervallength L at a variety of confidence levels, averaged over the 2-6 year band.
L 2.5% 5% 10% 90% 95% 97.5%50 0.5007 0.5098 0.5229 0.6296 0.6393 0.6478100 0.6604 0.6646 0.6739 0.7433 0.7466 0.7484200 0.8147 0.8156 0.8172 0.8384 0.8402 0.8415400 0.9201 0.9203 0.9208 0.9262 0.9265 0.9269
17
Table 3. WPI values for CCSMcontrol model/data calculation at a variety of confidencelevels, averaged over the 2-6 year band.
Run length (yrs) 2.5% 5% 10% 90% 95% 97.5%50 0.607 0.647 0.709 0.953 0.965 0.972100 0.751 0.778 0.805 0.956 0.966 0.973200 0.824 0.834 0.848 0.953 0.961 0.964400 0.865 0.869 0.875 0.933 0.936 0.938
18
List of Figures
1 Probability distribution functions for mean NINO3.4 wavelet power. The
white line represents the median value for the model run, while the gray
line is the mean value generated using the CORE hindcast. Dashed black
lines correspond to the 25th and 75th percentile values for the model run
(interquartile range). 20
2 Left-hand panels (a,b): 90% confidence interval on WPI distributions for self-
overlap calculations (a = CCSMcontrol, b = CM2.1). Center panels (c,d):
same as left-hand panels, for model/data WPI distributions. Right-hand pan-
els (e,f) show regression of 90% confidence interval widths against subinterval
length, for self-overlap calculations (top) and data/model comparisons (bot-
tom). In panels e and f, CCSMcontrol data appears as red X’s, CM2.1 as blue
squares. 21
3 Results of hypothesis testing procedure. Left panels: validation of CCSM-
control (top), RHLOW (middle) and CM2.1 (bottom) against the CORE
hindcast. Right panels: comparison of CCSMcontrol vs. CM2.1 (top) and
CCSMcontrol vs. RHLOW (bottom). In all panels, confidence levels plotted
range from 0 (agreement) to 1 (disagreement). 22
19
Fig. 1. Probability distribution functions for mean NINO3.4 wavelet power. The whiteline represents the median value for the model run, while the gray line is the mean valuegenerated using the CORE hindcast. Dashed black lines correspond to the 25th and 75thpercentile values for the model run (interquartile range).
20
Fig. 2. Left-hand panels (a,b): 90% confidence interval on WPI distributions for self-overlapcalculations (a = CCSMcontrol, b = CM2.1). Center panels (c,d): same as left-hand panels,for model/data WPI distributions. Right-hand panels (e,f) show regression of 90% confidenceinterval widths against subinterval length, for self-overlap calculations (top) and data/modelcomparisons (bottom). In panels e and f, CCSMcontrol data appears as red X’s, CM2.1 asblue squares.
21
Fig. 3. Results of hypothesis testing procedure. Left panels: validation of CCSMcontrol(top), RHLOW (middle) and CM2.1 (bottom) against the CORE hindcast. Right panels:comparison of CCSMcontrol vs. CM2.1 (top) and CCSMcontrol vs. RHLOW (bottom). Inall panels, confidence levels plotted range from 0 (agreement) to 1 (disagreement).
22