Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Noname manuscript No.
(will be inserted by the editor)
Data-driven prediction strategies for low-frequency patterns of North
Pacific climate variability
Darin Comeau · Zhizhen Zhao · Dimitrios Giannakis · Andrew J. Majda
Received: date / Accepted: date
Abstract The North Pacific exhibits patterns of low-frequency1
variability on the intra-annual to decadal time scales, which2
manifest themselves in both model data and the observa-3
tional record, and prediction of such low-frequency modes4
of variability is of great interest to the community. While5
parametric models, such as stationary and non-stationary au-6
toregressive models, possibly including external factors, may7
perform well in a data-fitting setting, they may perform poorly8
in a prediction setting. Ensemble analog forecasting, which9
relies on the historical record to provide estimates of the10
future based on past trajectories of those states similiar to11
the initial state of interest, provides a promising, nonpara-12
metric approach to forecasting that makes no assumptions13
on the underlying dynamics or its statistics. We apply such14
forecasting to low-frequency modes of variability for the15
North Pacific sea surface temperature and sea ice concentra-16
tion fields extracted through Nonlinear Laplacian Spectral17
Analysis. We find such methods may outperform paramet-18
ric methods and simple persistence with increased predictive19
skill.20
1 Introduction21
Predictability in general circulation models (GCMs) for the22
North Pacific, from seasonal to decadal time scales, has been23
the subject of many recent studies, (e.g., Tietsche et al (2014),24
Blanchard-Wrigglesworth and Bitz (2014), Blanchard-Wrigglesworth25
et al (2011a)), several of which focus on the role of the26
initial state (e.g., Collins (2002), Blanchard-Wrigglesworth27
et al (2011b), Branstator et al (2012), Day et al (2014)). The28
North Pacific exhibits prominent examples of interannual to29
D. ComeauCenter for Atmosphere Ocean Science, Courant Institute of Mathemat-ical Sciences, New York University, New York, NY.E-mail: [email protected]
decadal variability, such as the Pacific Decadal Oscillation30
(PDO; Mantua et al, 1997), and the more rapidly decorrelat-31
ing North Pacific Gyre Oscillation (NPGO; Di Lorenzo et al,32
2008), both of which have been the subject of much interest.33
An important phenomenon of intra-annual variability in the34
North Pacific is the reemergence of anomalies in both the sea35
surface temperature (SST) fields (Alexander et al, 1999), as36
well as in the sea ice concentration (SIC) field (Blanchard-37
Wrigglesworth et al, 2011a), where regional anomalies in38
these state variables vanish over a season, and reappear sev-39
eral months later, as made evident by high time-lagged cor-40
relations.41
The North Pacific (along with the North Atlantic) is a re-42
gion of relative strong low-frequency variability in the global43
climate system (Branstator et al, 2012). Yet, in GCMs it44
has been shown that this region shows relative lack of pre-45
dictability (less than a decade; Collins, 2002), with the Com-46
munity Climate System Model (CCSM) having particularly47
weak persistence and low predictability in the North Pacific48
among similiar GCMs (Branstator et al, 2012). The ocean49
and sea ice systems show stronger low-frequency variabil-50
ity than the atmosphere (Newman et al, 2003). The internal51
variability exhibited in the North Pacific has also been char-52
acterized as being in distinct climate regimes (e.g. Overland53
et al (2006, 2008)), where the dynamics exhibit regime tran-54
sitions and metastability. As a result, cluster based meth-55
ods have been a popular approach to model regime behavior56
in climate settings, such as in Franzke et al (2008, 2009),57
where hidden Markov models were used to model atmo-58
spheric flows.59
A traditional modeling approach for the PDO has been60
to fit an autoregressive process (Hasselmann, 1976; Frankig-61
noul and Hasselmann, 1977), as well as models being ex-62
ternally forced from the tropics through ENSO (Newman63
et al, 2003). In Giannakis and Majda (2012b), autoregres-64
sive models were successful in predicting temporal patterns65
2 Darin Comeau et al.
corresponding to the PDO and NPGO when forced with66
suitably modulated intermittent modes. Additional flexibil-67
ity can be built into regression models by allowing nonsta-68
tionary, i.e. time dependent coefficients, and a successful ex-69
ample of this is the finite element method (FEM) clustering70
procedure combined with multivariate autoregressive factor71
(VARX) model framework (Horenko, 2010a,b). In this ap-72
proach, the data is partitioned into a predetermined num-73
ber of clusters, and model regression coefficients are es-74
timated on each cluster, together with a cluster affiliation75
function that indicates which model is used at a particular76
time. This can further be adapted to account for external77
factors (Horenko, 2011a). While FEM-VARX methods can78
be effective at fitting the desired data, advancing the sys-79
tem state in the future for a prediction is dependent on be-80
ing able to successfully predict the unknown cluster affili-81
ation function. Methods for using this framework in a pre-82
diction setting have been used in Horenko (2011a,b). An-83
other regression modeling approach was put forward in Ma-84
jda and Harlim (2013), where physics constraints were im-85
posed on multilevel nonlinear regression models, preventing86
ad-hoc, finite-time blow-up, a pathological behavior previ-87
ously shown to exist in such models without physics con-88
straints in Majda and Yuan (2012). Appropriate nonlinear89
regression models using this strategy have been shown to90
have high skill for predicting the intermittent cloud patterns91
of tropical intra-seasonal variability (Chen et al, 2014; Chen92
and Majda, 2015b,a).93
Parametric models may perform well for fitting, but of-94
ten have poor performance in a prediction setting, partic-95
ularly in systems that exhibit distinct dynamical regimes.96
Nonparametric models can be advantageous in systems where97
the underlying dynamical system is unknown or imperfect.98
An early example of this is an analog forecast, first intro-99
duced by Lorenz (1969), where one considers the histori-100
cal record, and makes predictions based on examining state101
variable trajectories in the past that are similiar to the cur-102
rent state. Analog forecasting does not make any assump-103
tions on the underlying dynamics, and cleverly avoids model104
error when the underlying model is observations from na-105
ture. This has since been applied to other climate predic-106
tion scenarios, such as the Southern Oscillation Index (Dros-107
dowsky, 1994), the Indian summer monsoon (Xavier and108
Goswami, 2007), and wind forecasting (Alessandrini et al,109
2015), where it was found to be particularly useful in fore-110
casting rare events.111
Key to the success of an analog forecasting method is112
the ability to identify a good historical analog to the current113
initial state. In climate applications, the choice of analog is114
usually determined by minimizing Euclidean distance be-115
tween snapshots of system states, with a single analog being116
selected (Lorenz, 1969; Branstator et al, 2012). In Zhao and117
Giannakis (2014), analog forecasting was extended upon in118
two key ways. First, the state vectors considered were in119
Takens lagged embedding space (Takens, 1981), which cap-120
tures some of the dynamics of the system, rather than a snap-121
shot in time. Second, instead of selecting a single analog de-122
termined by Euclidean distance, weighted sums of analogs123
were considered, where weights are determined by a ker-124
nel function. In this context, a kernel is an exponentially de-125
caying pairwise similarity measure, intuitively, playing the126
role of a local covariance matrix. In Zhao and Giannakis127
(2014), kernels were introduced in the context of Nonlin-128
ear Spectral Analysis (NLSA; Giannakis and Majda, 2012c,129
2013, 2014) for decomposing high-dimensional spatiotem-130
poral data, leading naturally to a class of low-frequency ob-131
servables for prediction through kernel eigenfunctions. In132
Bushuk et al (2014), the kernel used in the NLSA algorithm133
was adapted to be multivariate, allowing for multiple vari-134
ables, possibly of different physical units, to be considered135
jointly in the analysis.136
The aim of this study is to develop low dimensional,137
data-driven models for prediction of dominant low-frequency138
climate variability patterns in the North Pacific, combining139
the approaches laid out in Zhao and Giannakis (2014) and140
Bushuk et al (2014). We invoke a prediction approach that141
is purely statistical, making use only of the available histori-142
cal record, possibly corrupted by model error, and the initial143
state itself. The approach utilizes out-of-sample extension144
methods (Coifman and Lafon, 2006b; Rabin and Coifman,145
2012) to define our target observable beyond a designated146
training period. There are some key differences between this147
study and that of Zhao and Giannakis (2014). First, we con-148
sider multivariate data, including SIC, which is intrinsically149
noisier than SST, given that SIC is a thresholded variable150
and experiences high variability in the marginal ice zone.151
We also make use of observational data, which is a rela-152
tively short time series compared to GCM model data. In153
addition to considering kernel eigenfunctions as our target154
observable, which we will see is well-suited for prediction,155
we also consider integrated sea ice anomalies, a more chal-156
lenging observable uninfluenced by the data analysis algo-157
rithm. We compare performance of these kernel ensemble158
analog forecasting techniques to some parametric forecast159
models (specifically, autoregressive and FEM-VARX mod-160
els), and find the former to have higher predictive skill than161
the latter, which often can not even outperform the simple162
persistence forecast.163
The rest of the paper is outlined as follows. In Section 2,164
we discuss the mathematical methods used to perform ker-165
nel ensemble analog forecasting predictions, and also dis-166
cuss alternative parametric forecasting methods. In Section167
3 we describe the data sets used for our experiments, and in168
Section 4 we present our results. Discussion and concluding169
remarks are in Section 5.170
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 3
2 Methods171
Our forecasting approach is motivated by using kernels, apairwise measure of similarity for the state vectors of inter-est. A simple example of such an object is a Gaussian kernel:
w(xi
,xj
) = e
�kx
i
�x
j
k2
s0 , (1)
where x is a state variable of interest, and s0 is a scale pa-172
rameter that controls the locality of the kernel. The kernels173
are then used to generate a weighted ensemble of analog174
forecasts, making use of an available historical record, rather175
than relying on a single analog. To extend the observable we176
wish to predict beyond the historical period into the future,177
we make use of out-of-sample extension techniques. These178
techniques allow one to extend a function (observable) to a179
new point y by looking at values of the function at known180
points x
i
close to y, where closeness will be determined by181
the kernels. An important part of any prediction problem is182
to choose a target observable that is both physically mean-183
ingful and exhibits high predictability. As demonstrated in184
Zhao and Giannakis (2014), defining a kernel on the desired185
space naturally leads to a preferred class of observables that186
exhibit time-scale separation and good predictability.187
2.1 Nonlinear Laplacian spectral analysis188
Nonlinear dynamical systems generally give rise to datasets189
with low-dimensional nonlinear geometric structures (e.g.,190
attractors). We therefore turn to a data analysis technique,191
NLSA, that allows us to extract spatio-temporal patterns from192
data from a high-dimensional non-linear dynamical system,193
such as a coupled global climate system (Giannakis and Ma-194
jda, 2012a,c, 2013, 2014). The standard NLSA algorithm195
is a nonlinear manifold generalization of singular spectrum196
analysis (SSA) (Ghil et al, 2002), where the covariance op-197
erator is replaced by a discrete Laplace-Beltrami operator198
to account for non-linear geometry on the underlying data199
manifold. The eigenfunctions of this operator then form a200
convenient orthonormal basis on the data manifold. A key201
advantage of NLSA is there is no pre-processing of the data202
needed, such as band-pass filtering or seasonal partitioning.203
2.1.1 Time-lagged embedding204
An important first step in the NLSA algorithm is to perform205
a time-lagged embedding of the spatio-temporal data as a206
method of inducing time-scale separation in the extracted207
modes. An analog forecast method is driven by the initial208
data, and to incorporate some of the system’s dynamics into209
account in selecting an analog, instead of using a snapshot210
in time of the state as our initial condition, time-lagged em-211
bedding helps make the data more Markovian.212
Let z(ti
) 2 Rd be a time series sampled uniformly withtime step d t, on a grid of size d, with i = 1, . . . ,N samples.We construct a lag-embedding of the data set with a windowof length q, and consider the lag-embedded time series
x(ti
) =�z(t
i
),z(ti�1), . . . ,z(t
i�(q�1))�2 Rdq.
The data now lies in Rm, with m = dq the dimension of the213
lagged embedded space, and n = N �q+1 number of sam-214
ples in lagged embedded space (also called Takens embed-215
ding space, or delay coordinate space). It has been shown216
that time-lagged embedding recovers the topology of the at-217
tractor of the underlying dynamical system that has been218
lost through partial observations (Takens, 1981; Sauer et al,219
1991). In particular, the embedding affects the non-linear ge-220
ometry of the underlying data manifold M , in such a way to221
allow for dynamically stable patterns with time scale sepa-222
ration (Berry et al, 2013), a desirable property that will lead223
to observables with high predictability.224
2.1.2 Discrete Laplacian225
The next step is to define a kernel on the data manifold M .Rather than using a simple Gaussian as in Equation (1), theNLSA kernel makes use of phase velocities x
i
= kx(ti
)�x(t
i�1)k, which forms a vector field on the data manifoldand provides additional important dynamic information. TheNLSA kernel we use is
K (x(ti
),x(tj
)) = exp✓�kx(t
i
)� x(tj
)k2
ex
i
x
j
◆. (2)
With this kernel K and associated matrix K
i j
=K (x(ti
),x(tj
)),we solve the Laplacian eigenvalue problem to acquire aneigenfunction basis {f
i
} on the data-manifold M . To dothis, we construct the discrete graph Laplacian by followingthe diffusion maps approach of Coifman and Lafon (2006a),and forming the following matrices:
Q
i
=n
Âj=1
K
i j
, K
i j
=K
i j
Q
a
i
Q
a
j
,
D
i
=n
Âj=1
K
i j
, P
i j
=K
i j
D
i
, L
i j
= I �P
i j
.
Here a is a real parameter, typically with value 0, 12 , 1. We
note in the large data limit, as n ! • and e ! 0, this dis-crete Laplacian converges to the Laplace-Beltrami operatoron M for a Riemannian metric that depends on the kernel(Coifman and Lafon, 2006a). We can therefore think of thekernel as biasing the geometry of the data to reveal a class offeatures, and the NLSA kernel does this in such a way as toextract dynamically stable modes with time scale separation.We then solve the eigenvalue problem
Lf
i
= l
i
f
i
. (3)
4 Darin Comeau et al.
The resulting Laplacian eigenfunctions f
i
= (f1i
, . . . ,fni
)T
form an orthonormal basis on the data manifold M withrespect to the weighted inner product:
hfi
,fj
i=n
Âk=1
D
k
f
ki
f
k j
= d
i j
.
As well as forming a convenient orthonormal basis, the226
eigenfunctions f
i
give us a natural class of observables with227
good time-scale separation and high predictability. These228
eigenfunctions f
i
are time series, nonlinear analogs to prin-229
cipal components, and can be used to recreate spatio-temporal230
modes, similar to extended empirical orthogonal functions231
(EOFs) (Giannakis and Majda, 2012b,c, 2013). However,232
unlike EOFs, the eigenfunctions f
i
do not measure variance,233
but rather measure oscillations or roughness in the abstract234
space M , the underlying data manifold. The eigenvalues l
i
235
measure the Dirichlet energy of the corresponding eigen-236
functions, which has the interpretation of being squared wave237
numbers on this manifold (Giannakis and Majda, 2014). We238
now use the leading low-frequency kernel eigenfunctions as239
our target observables for prediction.240
2.1.3 Multiple components241
As described in Bushuk et al (2014), the above NLSA al-gorithm can be modified to incorporate more than one timeseries. Let z
(1), z
(2) be two signals, sampled uniformly withtime step d t, on (possibly different) d1,d2 grid points. Af-ter lag-embedding each variable with embedding windowsq1,q2 to its appropriate embedding space, so z
(1)(ti
)2Rd1 7!x
(1)(ti
)2Rd1q1 and z
(2)(ti
)2Rd2 7! x
(2)(ti
)2Rd2q2 , we con-struct the kernel function K by scaling the physical variablesx
(1),x(2) to be dimensionless by
K
i j
= exp
0
@�kx
(1)(ti
)� x
(1)(tj
)k2
ex
(1)i
x
(1)j
�kx
(2)(ti
)� x
(2)(tj
)k2
ex
(2)i
x
(2)j
1
A .
(4)
This can then be extended to any number of variables,242
regardless of physical units, and allows for analysis in cou-243
pled systems, such as the ocean and sea ice components of244
a climate model. An alternative approach to Equation (4) in245
extending the NLSA kernel to multiple variables with dif-246
ferent physical units is to first normalize each variable to247
unit variance. However a drawback of this approach is that248
information about relative variability is lost as the ratios of249
the variances are set equal, rather than letting the dynam-250
ics of the system control the variance ratios of the different251
variables (Bushuk et al, 2014) as in Equation (4).252
2.2 Out-of-sample extension253
Now that we have established our class of target observ-254
ables, namely the eigenfunctions f
i
in Equation (3), we need255
a method for extending these observables into the future256
to form our predictions, for which we draw upon out-of-257
sample extension techniques. To be precise, let f be a func-258
tion defined on a set M = {x1, . . . ,xn
},xi
2 Rm; f may be259
vector-valued, but in our case is scalar. We wish to make260
a prediction of f by extending the function to be defined261
on a point outside of the training set M, by performing an262
out-of-sample extension, which we call f . There are some263
desireable qualities we wish to have in such an extension,264
namely that f is in some way well-behaved and smooth on265
our space, and is in consistent as the number of in-samples266
increases. Below we discuss two such methods of out-of-267
sample extension, the geometric harmonics, based on the268
Nystrom method, and Laplacian pyramids.269
2.2.1 Geometric harmonics270
The first approach for out-of-sample extension is based onthe Nystrom method (Nystrom, 1930), recently adapted tomachine learning applications (Coifman and Lafon, 2006b),and is based on representing a function f on M in terms of aneigenfunction basis obtained from a spectral decompositionof a kernel. While we use the NLSA kernel (Equation 2),other kernels could be used, another natural choice beinga Gaussian kernel. For a general kernel w : Rm ⇥Rm ! R,consider its row-sum normalized counterpart:
W (yi
,xj
) =w(y
i
,xj
)
Âj
w(yi
,xj
).
These kernels have the convenient interpretation of formingdiscrete probability distributions in the second argument, de-pendent on the first argument, so W (y,x) = p
y
(x), which willbe a useful perspective later in our analog forecasting in Sec-tion 2.3. We then solve the eigenvalue problem
l
l
j
l
(xi
) =n
Âj=1
W (xi
,xj
)jl
(xj
).
We note the spectral decomposition of W yields a set of realeigenvalues l
l
and an orthonormal set of eigenfunctions j
l
that form a basis for L
2(M) (Coifman and Lafon, 2006b),and we can thus represent our function f in terms of thisbasis:
f (xi
) =n
Âj=1
hjj
, f ijj
(xi
). (5)
Let y 2 Rm,y /2 M be an out-of-sample, or test data, point,to which we wish to extend the function f . If l
l
6= 0, the
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 5
eigenfunction j
l
can be extended to any y 2 Rm by
j
l
(y) =1l
l
n
Âj=1
W (y,xj
)jl
(xj
). (6)
This definition ensures that the out-of-sample extension isconsistent when restricted to M, meaning j
l
(y) = j
l
(y) fory 2 M. Combining Equations (5) and (6) allows f to be as-signed for any y 2Rm by evaluating the eigenfunctions j
l
aty and using the projection of f onto these eigenfunctions asweights:
f (y) =n
Âj=1
hjj
, f ijj
(y). (7)
Equation (7) is called the Nystrom extension, and in Coif-271
man and Lafon (2006b) the extended eigenfunctions in Equa-272
tion (6) are called geometric harmonics. We note this scheme273
becomes ill conditioned since l
l
! 0 as l ! • (Coifman274
and Lafon, 2006b), so in practice there is a truncation of275
the sum at some level l, usually determined by the decay of276
the eigenvalues. With the interpretation of the eigenvalues l277
as wavenumbers on the underlying data manifold, this trun-278
cation represents removing features from the data that are279
highly oscillatory in this space.280
2.2.2 Laplacian pyramid281
The geometric harmonics method is well suited for observ-282
ables that have a tight bandwidth in the eigenfunction ba-283
sis (particularly the eigenfunctions themselves), but for ob-284
servables that may require high levels of eigenfunctions in285
their representation, the above mentioned ill-conditioning286
may hamper this method. An alternative to geometric har-287
monics is the Laplacian pyramid (Rabin and Coifman, 2012),288
which invokes a multiscale decomposition of the original289
function f in its out-of-sample extension approach.290
A family of kernels defined at different scales is needed,and for clarity of exposition we will use a family of Gaus-sian kernels w
l
(and their row-normalized counterparts W
l
)at scales l:
w
l
(xi
,xj
) = e
�kx
i
�x
j
k2
s0/2l , (8)
That is, l = 0 represents the widest kernel width, and in-291
creasing l gives finer kernels resolving more localized struc-292
tures.293
For a function f : M ! R, the Laplacian pyramid rep-resentation of f approximates f in a multiscale manner byf ⇡ s0 + s1 + s2 + · · · , where the first level s0 is defined by
s0(xk
) =n
Âi=1
W0(xi
,xk
) f (xi
),
and we then evaluate the difference:
d1 = f � s0.
We then iteratively define the lth level decomposition s
l
:
s
l
(xk
) =n
Âi=1
W
l
(xi
,xk
)dl
(xi
), d
l
= f �l�1
Âi=0
s
i
.
Iteration is continued until some prescribed error tolerance294
k f �Âk
s
k
k< e is met.295
Next we extend f to a new point y 2 Rm,y /2 D by
s0(y) =n
Âi=1
W0(xi
,y) f (xi
), s
l
(y) =n
Âi=1
W
l
(xi
,y)dl
(xi
),
for l � 1, and assign f the value
f (y) = Âk
s
k
(y). (9)
That is, we have formed a multiscale representation of f296
using weighted averages of f for nearby inputs, where the297
weights are given by the scale of the kernel function. Since298
the kernel function can accept any inputs from Rm, we can299
define these weights for other points outside M, and thus de-300
fine f by using weighted values of f on M (known), where301
now the weights are given by the proximity of the out-of-302
sample y /2 M to input points x
i
2 M. The parameter choices303
of the initial scale s0 and error tolerance e set the scale and304
cut-off of the dyadic decomposition of f in the Laplacian305
pyramid scheme. We choose s0 to be the median of the pair-306
wise distances of our training data, and the error tolerance e307
to be scaled by the norm of the observable over the training308
data, for example 10�6k fk. In our applications below, we309
use a multiscale family of NLSA kernels based on Equation310
(4) rather than the above family of Gaussian kernels.311
2.3 Kernel ensemble analog forecasting312
The core idea of traditional analog forecasting is to iden-313
tify a suitable analog to one’s current initial state from a314
historical record, and then make a prediction based on the315
trajectory of that analog in the historical record. The analog316
forecasting approach laid out in Zhao and Giannakis (2014),317
which we use here, varies in a few important regards. First,318
the initial system state, as well as the historical record (train-319
ing data), is in Takens embedding space, so that an analog is320
not determined by a snapshot in time alone, but the current321
state with some history (a ‘video’). Second, rather than us-322
ing Euclidean distance, as in traditional analog forecasting,323
the distances we use are based on a defined kernel function,324
which reflects a non-Euclidean geometry on the underlying325
data manifold. The choice of geometry is influenced by the326
lagged embedding and choice of kernel, which we have done327
in a way that gives us time scale separation in the resulting328
6 Darin Comeau et al.
eigenfunctions f , yielding high predictability. Third, rather329
than identify and use a single analog in the historical record,330
weighted sums of analogs are used, with weights determined331
by the kernel function.332
For this last point, it is useful to view analog forecastingin the context of an expectation over an empirical probabilitydistribution. For example, a traditional analog forecast of anew initial state y at some time t in the future, based onselected analog x
j
= x(tj
), can be written as
f (y,t) = Ep
y
S
t
f =n
Âi=1
p
y
(xi
) f (x(ti
+ t)) = f (x(tj
+ t)) ,
where p
y
= d
i j
is the Dirac delta function, and S
t
f (xi
) =333
f (x(ti
+ t)) is the operator that shifts the timestamp of x
i
334
by t . To make use of more than one analog and move to335
an ensemble, we let p
y
be a more general discrete empir-336
ical distribution, dependent on the initial condition y, with337
probabilities (weights) determined by our kernel function.338
Writing in this way, we simply need to define the empiri-339
cal probability distribution p
y
to form our ensemble analog340
forecast.341
For the geometric harmonics method, we projected f
onto an eigenfunction basis f
i
(truncated at some level l)and performed an out-of-sample extension on each eigen-function basis function, i.e.
f (y) =l
Âi=1
hfi
, f ifi
(y).
Or, to write this in terms of an expectation over an empiricalprobability measure:
f (y) = Ep
y
f =l
Âi=1
hfi
, f iEp
y
f
i
(y),
where
Ep
y
f
i
=1l
i
n
Âj=1
W (y,xj
)fi
(x(tj
)).
We then define our prediction for lead time t via geometricharmonics by
f (y,t) = Ep
y
S
t
f =l
Âi=1
hfi
, f iEp
y
S
t
f
i
,
where
Ep
y
S
t
f
i
=1l
i
n
Âj=1
W (y,xj
)fi
(x(tj
+ t)).
Similarly, the t shifted ensemble analog forecast via Lapla-cian pyramids is then
f (y,t) = Ep
y,0S
t
f +l
Âi=1
Ep
y,iSt
d
i
,
where p
y,i(x) = W
i
(y,x) corresponds to the probability dis-342
tribution from the kernel at scale i.343
Thus we have a method for forming a weighted ensem-344
ble of predictions, that is non-parametric and data-driven345
through the use of a historical record (training data), which346
itself has been subject to analysis that reflects the dynamics347
of the high-dimensional system in the non-linear geometry348
on the underlying abstract data manifold M , and produces a349
natural preferred class of observables to target for prediction350
through the kernel eigenfunctions.351
2.4 Autoregressive modeling352
We wish to compare our ensemble analog prediction meth-ods to more traditional parametric methods, namely autore-gressive models of the North Pacific variability (Frankignouland Hasselmann, 1977), of the form
x(t +1) = µ(t)+A(t)x(t)+s(t)e(t)
where x(t) is our signal, µ(t) is the external forcing (pos-353
sibly 0), A(t) is the autoregressive term, and s(t) is the354
noise term, each to be estimated from the training data, and355
e(t) is a Gaussian process. In the stationary case, the model356
coefficients µ(t) = µ , A(t) = A, s(t) = s are constant in357
time, and can be evaluated in an optimal way through ordi-358
nary least squares. In a non-stationary case, we will invoke359
the FEM-VARX framework of Horenko (2010b) by cluster-360
ing the training data into K clusters, and evaluating coef-361
ficients A
k
, s
k
, k = 1, . . . ,K for each cluster (see Horenko362
(2010a,b) for more details on this algorithm). From the algo-363
rithm we obtain a cluster identification function G (t), such364
that G (ti
) = k indicates at time t
i
the model coefficients are365
A(ti
) = A
k
, s(ti
) = s
k
. In addition to choosing the number366
of clusters K, the method also has a persistence parameter367
C that governs the number of allowable switches between368
clusters that must be chosen prior to calculating model co-369
efficients. These parameters are usually chosen to be opti-370
mal in the sense of the Akaike Information Criterion (AIC)371
(Horenko, 2010b; Metzner et al, 2012), an information the-372
oretic based measure for model selection which penalizes373
overfitting by large number of parameters.374
As mentioned earlier, while non-stationary autoregres-sive models may perform better than stationary models infitting, an inherent difficulty in a prediction setting is the ad-vancement of the model coefficients A(t),s(t) beyond thetraining period, which in the above framework amounts tosolely advancing the cluster affiliation function G (t). If wecall p
k
(t) the probability of the model being at cluster k attime t, we can view G (t) as determining a Markov switchingprocess on the cluster member probabilities
p(t) = (p1(t), . . . ,pK
(t)),
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 7
which over the training period will be 1 in one entry, and 0elsewhere at any given time. We can estimate the transitionprobability matrix T of that Markov process by using the op-timal cluster affiliation sequence G (t) from the FEM-VARXframework (here optimal is for the training period). Assum-ing the Markov hypothesis, we can estimate the stationaryprobability transition matrix directly from G (t) by:
T
i j
=N
i j
ÂK
k=1 N
ik
,
where N
i j
is the number of direct transitions from state i375
to state j (Horenko, 2011a; Franzke et al, 2009). This esti-376
mated transition probability matrix T can be used to model377
the Markov switching process in the following ways.378
2.4.1 Generating predictions of cluster affiliation379
The first method we employ is to advance the cluster mem-ber probabilities p(t) using the estimated probability transi-tion matrix T by the deterministic equation (Franzke et al,2009):
p(t0 + t) = p(t0)Tt
where p(t0) = (p1(t0),p2(t0)) is the initial cluster affiliation,380
which is determined by which cluster center the initial point381
x(t0) is closest to, and p
i
(t0) is either 0 or 1.382
The second method we employ is to use the estimated383
transition matrix T to generate a realization of the Markov384
switching process G
R
, and use this to determine the model385
cluster at any given time, maintaining strict model affilia-386
tion. Thus p
k
(t) = 1 is G
R
(t) = k, and 0 otherwise.387
2.5 Error metrics388
To gauge the fidelity of our predictions, we will evaluate the389
average root-mean-square error (RMSE) and pattern corre-390
lation (PC) of our predictions y against the ground truth x,391
where points in our test data set (of length n
0) are used as in-392
tial conditions for our predictions. As a benchmark, we will393
compare each prediction approach to a simple persistence394
forecast y(t) = y(0). The error metrics are calculated as395
rms
2(t) =1n
0
n
0
Âj=1
(y(tj
+ t)� x(tj
+ t))2 ,
pc(t) =1n
0
n
0
Âj=1
(y(tj
+ t)� y(t))(x(tj
+ t)� x(t))
s
y
(t)sx
(t),
where
y(t) =1n
0
n
0
Âj=1
y(tj
+ t), x(t) =1n
0
n
0
Âj=1
x(tj
+ t),
396
s
2y
(t) =1n
0
n
0
Âj=1
(y(tj
+ t)� y(t))2,
s
2x
(t) =1n
0
n
0
Âj=1
(x(tj
+ t)� x(t))2.
An important note is that for data-driven observables, such397
as NLSA eigenfunctions or EOF principal components, there398
is no underlying ground truth when predicting into the fu-399
ture. As such a ground truth for comparison needs to be de-400
fined when evaluating the error metrics, for which one can401
use the out-of-sample extended function f (y(tj
+ t)) as de-402
fined in Equations (7) and (9).403
3 Datasets404
3.1 CCSM model output405
We use model data from the Community Climate System406
Model (CCSM), versions 3 and 4, for monthly SIC and SST407
data, restricted to the North Pacific, which we define as 20�–408
65�N, 120�E–110�W. CCSM3 model data is used from a409
900 year control run (experiment b30.004) (Collins et al,410
2006). The sea ice component is the Community Sea Ice411
Model (CSIM; Holland et al, 2006) and the ocean compo-412
nent is the Parallel Ocean Program (POP), both of which are413
sampled on the same nominal 1� grid. CCSM4 model data414
is used from a 900 year control run (experiment b40.1850),415
which uses the Community Ice CodE 4 model for sea ice416
(CICE4; Hunke and Lipscomb, 2008) and the POP2 model417
for the ocean component (Smith et al, 2010), also on a com-418
mon nominal 1� grid. Specific differences and improvements419
between the two model versions can be found in Gent et al420
(2011).421
NLSA was performed on these data sets, both in sin-422
gle and multiple component settings, with the same embed-423
ding window of q1 = q2 = q = 24 months for each variable,424
kernel scale e = 2, and kernel normalization a = 0. A 24425
month embedding window was chosen to allow for dynam-426
ical memory beyond the seasonal cycle, as is used in other427
NLSA studies (e.g., Giannakis and Majda (2012b, 2013);428
Bushuk et al (2014)). There are 6648 grid points for SST and429
3743 for SIC in this region for these models, and with an em-430
bedding window of q= 24, this means our lagged embedded431
data lies in Rm with m = 159,552 for SST and m = 89,832432
for SIC. For the purposes of a perfect model experiment,433
where the same model run is used for both the training and434
test data, we split the CCSM4 control run into two 400 year435
sets; years 100 – 499 for the training data set, and years 500436
– 899 for out-of-sample test points. After embedding, this437
leaves us with n = n
0 = 4777 samples in each data set. In438
our model error experiment, we train on 800 years of the439
8 Darin Comeau et al.
CCSM3 control run, and use 800 years of CCSM4 for test440
data, giving us n = n
0 = 9577 data points.441
In addition to low-frequency kernel eigenfunctions asobservables, we also consider North Pacific integrated seaice extent anomalies as a target for prediction in Section 4.2.This observable exhibits faster variability, on an intra-annualtime-scale, and as such we reduce the embedding windowfrom 24 months to 6 months for our kernel evaluation. Usingthe CCSM4 model training data, we define monthly anoma-lies by calculating a climatology f
c
of monthly mean sea iceextent. Let v
j
be gridpoints in our domain of interest (NorthPacific), c the mean SIC, a the grid cell area, and then thesea ice extent anomaly hat will be our target observable isdefined as
f (ti
) = Âj
c(vj
, ti
)a(vj
)� f
c
(ti
). (10)
3.2 Observational data442
For observational data, we turn to the Met Office Hadley443
Centre’s HadISST data set (Rayner et al, 2003), and use444
monthly data for SIC and SST, from years 1979–2012, sam-445
pled on a 1� latitude-longitude grid. We assign ice covered446
grid points an SST value of �1.8�C, and have removed a447
trend from the data by calculating a linear trend for each448
month. There are 4161 spatial grid points for SST, for a449
lagged embedded dimension of m = 99,864, and 3919 grid450
points for SIC, yielding m = 94,056. For direct comparison451
with this observational data set, the above CCSM model data452
sets have been interpolated from the native POP grid 1� grid453
to a common 1� latitude-longitude grid. After embedding,454
we are left with n
0 = 381 observation test data points.455
4 Results456
4.1 Low-frequency NLSA modes457
The eigenfunctions that arise from NLSA typically fall into458
one of three categories: i) periodic modes, which capture the459
seasonal cycle and its higher harmonics; ii) low-frequency460
modes, characterized by a red power spectrum and a slowly461
decaying autocorrelation function; and iii) intermittent modes,462
which have the structure of periodic modes modulated with a463
low-frequency envelope. The intermittent modes are dynam-464
ically important (Giannakis and Majda, 2012a), shifting be-465
tween periods of high activity and quiessence, but carry lit-466
tle variance, and are thus typically missed or mixed between467
modes in classical SSA. Examples of each of these eigen-468
functions arising from a CCSM4 data set with SIC and SST469
variables are shown in Figure 1, for years 100–499 of the470
pre-industrial control run. The corresponding out-of-sample471
extension eigenfunctions, defined through the Nystrom method,472
are shown in Figure 2, and are computed using years 500–473
899 of the same CCSM4 pre-industrial control run as test474
(out-of-sample) data. We use the notation f
S
L1, f
I
L1, or f
SI
L1,475
to indicate if the NLSA mode is from SIC, SST, or joint476
SST and SIC variables, respectively.477
Snapshot of Time Series
!1
!2
0
2
Power Spectral Density
1E!4
1E!2
1E0
1E2
Autocorrelation Function
!1
!3
!2
0
2
1E!4
1E!2
1E0
1E2
!3
!10
!2
0
2
1E!4
1E!2
1E0
1E2
!10
!15
!2
0
2
1E!4
1E!2
1E0
1E2
!15
!9
!2
0
2
1E!4
1E!2
1E0
1E2
!9
time t (y)
!14
0 10 20 30 40
!2
0
2
frequency " (y!1
)
1E!2 1E!1 1E0 1E11E!4
1E!2
1E0
1E2
time t (m)
!14
0 12 24 36 48
Fig. 1 Select NLSA eigenfunctions for CCSM4 North Pacific SST andSIC data. f1, f3 are periodic (annual and semi-annual) modes, charac-terized by a peak in the power spectrum and oscillatory autocorrelationfunction; f9, f14 are the two leading low-frequency modes, character-ized by a red power spectrum and slowly decaying montone autocor-relation function; f10, f15 are intermittent modes, characterized by abroad peak in the power spectrum and decaying oscillatory autocorre-lation function.
We perform our prediction schemes for five year time478
leads by applying the kernel ensemble analog forecast meth-479
ods discussed in Section 2.3 to the leading two low-frequency480
modes f
SI
L1, f
SI
L2from NLSA on North Pacific, shown in Fig-481
ure 1. The leading low-frequency modes extracted through482
NLSA can be though of as analogs to the well known PDO483
and NPGO modes, even in the multivariate setting. We have484
high correlations between the leading multivariate and uni-485
variate low-frequency NLSA modes, with corr (f SI
L1,f I
L1) =486
�0.9907 for our analog of the NPGO mode, and corr (f SI
L2,f S
L1)=487
0.8415 for our analog of the PDO mode.488
As a benchmark, we compare against the simple con-489
stant persistence forecast y(t) = y(0), which can perform490
reasonably well given the long decorrelation time of these491
low-frequency modes, and in fact beats paramteric autore-492
gressive models as we will see below. We define the ground493
truth itself to be the out-of-sample eigenfunction calculated494
by Equation (7), so by construction, our predictions by the495
geometric harmonics method are exact at time lag t = 0,496
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 9
Snapshot of Time Series
!1
!2
0
2
Power Spectral Density
1E!4
1E!2
1E0
1E2
Autocorrelation Function
!1
!3
!2
0
2
1E!4
1E!2
1E0
1E2
!3
!10
!2
0
2
1E!4
1E!2
1E0
1E2
!10
!15
!2
0
2
1E!4
1E!2
1E0
1E2
!15
!9
!2
0
2
1E!4
1E!2
1E0
1E2
!9
time t (y)
!14
0 10 20 30 40
!2
0
2
frequency " (y!1
)
1E!2 1E!1 1E0 1E11E!4
1E!2
1E0
1E2
time t (m)
!14
0 12 24 36 48
Fig. 2 Out-of-sample extension eigenfunctions, computed via Equa-tion (7), for CCSM North Pacific SST and SIC data, associated withthe same select (in-sample) eigenfunctions shown in Figure 1.
whereas predictions using Laplacian pyramids will have re-497
construction errors at time lag t = 0.498
4.1.1 Perfect model499
We first consider the perfect model setting, where the same500
dynamics generate the training data and test (forecast) data.501
This should give us a measure of the potential predictability502
of the methods. Snapshots of sample prediction trajectories503
along with the associated ground truth out-of-sample eigen-504
function are shown in Figure 3. In Figure 4, we see that for505
the leading low-frequency mode f
SI
L1(our NPGO analog),506
the ensemble based predictions perform only marginally bet-507
ter than persistence in the PC metric, but have improved508
RMSE scores over longer timescales. However with the sec-509
ond mode f
SI
L2(our PDO analog), we see a more noticeable510
gain in predictive skill with the ensemble analog methods511
over persistence. If we take 0.6 as a PC threshold (Collins,512
2002), below which we no longer consider the model to have513
predictive skill, we see an increase of about 8 months in pre-514
dictive skill with the ensemble analog methods over persis-515
tence, with skillful forecasts up to 20 months lead time. We516
note these low-frequency modes extracted from multivariate517
data exhibit similiar predictability (as measured by when the518
PC falls below the 0.6 threshold) than their univariate coun-519
terparts (f S
L1or f
I
L1, results not shown).520
0 10 20 30 40 50 60
0
0.5
1NPGO Sample Trajectories
time(months)
TGHLP
0 10 20 30 40 50 60
!0.5
0
0.5
1
PDO Sample Trajectories
time(months)
TGHLP
Fig. 3 Sample snapshot trajectories of ensemble analog prediction re-sults via geometric harmonics (GH, blue, solid) and Laplacian pyramid(LP, blue, dashed), for the leading two low-frequency modes, in perfectmodel setting using CCSM4 data. The ground truth (T, black) is itselfan out-of-sample extension eigenfunction, as shown in Figure 2.
0 10 20 30 40 50 600
0.5
1
1.5
2RMSE for NPGO predictions
time (months)
GHLPP
0 10 20 30 40 50 600
0.5
1
1.5
2RMSE for PDO predictions
time (months)
GHLPP
0 10 20 30 40 50 60!0.5
0
0.5
1PC for NPGO predictions
time (months)
GHLPP
0 10 20 30 40 50 60!0.5
0
0.5
1PC for PDO predictions
time (months)
GHLPP
Fig. 4 Kernel ensemble analog prediction results via geometric har-monics and Laplacian pyramid for the leading two low-frequencymodes, in perfect model setting using CCSM4 data. Both out-of-sample extension methods outperform the persistence forecast (P) inboth error metrics, particularly for in the f
SI
L2(PDO) mode.
4.1.2 Model error521
To incorporate model error into our prediction experiment,522
we train on the CCSM3 model data, serving as our ‘model’,523
and then use CCSM4 model data as our test data, serving the524
role of our ‘nature’, and the difference in the fidelity of the525
two model versions represents general model error. In this526
experiment, our ground truth is an out-of-sample eigenfunc-527
tion trained on CCSM3 data, and extended using CCSM4528
test data. For the leading f
SI
L1(NPGO) mode, we see again529
marginal increased predictive performance in the ensemble530
analog predictions over persistence at short time scales in531
PC in Figure 5, but at medium to long time scales this im-532
provement has been lost (though after the score has fallen533
below the 0.6 threshold). This could be due to the increased534
fidelity of the CCSM4 sea ice model component over the535
CCSM3 counterpart, where using the less sophisticated model536
10 Darin Comeau et al.
data for training leaves us a bit handicapped in trying to pre-537
dict the more sophisticated model data (Gent et al, 2011;538
Holland et al, 2012). The improvement in the predictive skill539
of f
SI
L2(PDO) mode over persistence is less pronounced in540
the presence of model error than it was in the perfect model541
case shown in Figure 4. Nevertheless, the kernel ensemble542
analog forecasts still provide a substantial improvement of543
skill compared to the persistence forecast, extending the PC544
= 0.6 threshold to 20 months.545
0 10 20 30 40 50 600
0.5
1
1.5RMSE for NPGO predictions
time (months)
GHLPP
0 10 20 30 40 50 600
0.5
1
1.5RMSE for PDO predictions
time (months)
GHLPP
0 10 20 30 40 50 60!0.5
0
0.5
1PC for NPGO predictions
time (months)
GHLPP
0 10 20 30 40 50 60!0.5
0
0.5
1PC for PDO predictions
time (months)
GHLPP
Fig. 5 Kernel ensemble analog forecasting prediction results for theleading two low-frequency modes, with model error. CCSM3 modeldata is used for the in-sample data, and CCSM4 model data is used asthe out-of-sample data. While the predictions still outperform persis-tence in the error metrics, there is less gain in predictive skill over ascompared to the perfect model case.
We can further examine the model error scenario by us-546
ing the actual obervational data set as our nature, and CCSM4547
as our model (Figure 6). Given the short observational record548
used, far fewer prediction realizations are generated, adding549
to the noisiness of the error metrics. The loss of predictabil-550
ity is apparent, especially in the f
SI
L2mode, where the kernel551
ensemble analog forecasts fail to beat persistence, and drop552
below 0.6 in PC by 10 months, half as long as the perfect553
model case.554
4.1.3 Comparison with Autoregressive Models555
We compare the ensemble analog predictions to standard556
stationary autoregressive models, as well as non-stationary557
models using the FEM-VARX framework of Horenko (2010b)558
discussed in Section 2.4, for the low-frequency modes f
SI
L1,559
f
SI
L2generated from CCSM4 model (Figure 7) and the HadISST560
observation (8) training data. For both sets of low-frequency561
modes, K = 2 clusters was judged to be optimal by the AIC562
as mentioned in Section 2.4–see Horenko (2010b); Metzner563
0 10 20 30 40 50 600
0.5
1
1.5RMSE for NPGO predictions
time (months)
GHLPP
0 10 20 30 40 50 600
0.5
1
1.5RMSE for PDO predictions
time (months)
RM
SE
GHLPP
0 10 20 30 40 50 60!0.5
0
0.5
1PC for NPGO predictions
time (months)
GHLPP
0 10 20 30 40 50 60!0.5
0
0.5
1PC for PDO predictions
time (months)
GHLPP
Fig. 6 Ensemble analog prediction results for the leading two low-frequency modes, with model error. CCSM4 model data is used for thein-sample data, and HadISST observational data is used as the out-of-sample data. Here the method produces little advantage over persis-tence, given the model error between model and nature, and actuallyfails to beat persistence for the f
SI
L2(PDO) mode.
et al (2012) for more details. The coefficients for each clus-564
ter are nearly the same (autoregressive coefficent close to 1,565
similar noise coefficients), apart from the constant forcing566
coefficient µ
i
of almost equal magnitude and opposite sign,567
suggesting two distinct regime behaviors. In the stationary568
case, the external forcing coefficient µ is very close to 0.569
In the top left panel of Figures 7 and 8, we display snap-570
shots of the leading low-frequency mode f
SI
L1(NPGO) tra-571
jectory reconstruction during the training period, for the sta-572
tionary (blue, solid), and non-stationary (red, dashed) mod-573
els, along with the cluster switching function associated with574
the non-stationary model. In both the CCSM4 model and575
HadISST data sets, the non-stationary model snapshot is a576
better representation of the truth (black, solid), and the ben-577
efit over the stationary model is more clearly seen in the578
CCSM4 model data, Figure 7, which has the benefit of a579
400 year training period, as opposed to the shorter 16 year580
training period with the observational data set.581
In the prediction setting, however, the non-stationary mod-582
els, which are reliant on an advancement of the unknown583
cluster affiliation function G (t) beyond the training period,584
as discussed in Section 2.4, fail to outperform their station-585
ary counterparts in the RMSE and PC metrics (bottom pan-586
els of Figures 7 and 8). In fact, none of the proposed re-587
gression prediction models are able to outperform the sim-588
ple persistence forecast in these experiments. As a measure589
of potential predition skill for the non-stationary models,590
whereby we mean that if perfect knowledge of the underly-591
ing optimal cluster switching function G (t) could be known592
over the test period, we have run the experiment of replacing593
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 11
the test data period with the training data set, and find ex-594
ceedingly strong predictive performance, with PC between595
0.7 and 0.8 for all time lags tested, up to 60 months. Simi-596
lar qualitative results for the second leading low-frequency597
mode f
SI
L2(PDO) for each data set were found (not shown).598
This suggests that the Markov hypothesis, the basis for the599
predictions of G (t), is not accurate, and other methods in-600
corporating more memory are needed.601
20 40 60 80 100 120 140 160 1801
1.5
2FEM!VARX model for low!frequency mode
clust
er
affili
atio
n
20 40 60 80 100 120 140 160 180!4
!2
0
2
reco
nst
ruct
ed m
ode
0 10 20 30 40 50 60!2.5
!2
!1.5
!1
!0.5
0
0.5
1Sample Trajectories
time (months)
TP2K1M1M2
0 10 20 30 40 50 600
0.5
1
1.5
2RMSE
time (months)
P1P2K1M1M2PT
0 10 20 30 40 50 60!0.5
0
0.5
1PC
time (months)
P1P2K1M1M2PT
Fig. 7 Top left: Snapshot of the true CCSM4 f
SI
L1(NPGO) trajectory
(black) with reconstructed stationary (blue) and non-stationary K = 2(red) FEM-VARX model trajectories, along with corresponding modelaffiliation function G (t) for non-stationary case. Top right: Sample tra-jectories for various prediction methods: P2 = stationary, using FEM-VARX model coefficients from initial cluster; K1 = stationary autore-gressive; M1, M2 = FEM-VARX with predictions as described in Sec-tion 2.4.1, where M1 is deterministic evolution of the cluster affiliationp(t), and M2 uses realizations of p generated from the estimated prob-ability transition matrix T . Bottom panels: RMSE and PC as a functionof lead time for various prediction methods, including P1 = persistenceas a benchmark. The dashed black line is for potential predictive skill ofnon-stationary FEM-VARX, where predictions were ran over the train-ing period using the known optimal model affiliation function G (t).
4.2 Sea ice anomalies602
The targeted observables hereto considered for prediction603
have been data driven, and as such influenced by the data604
analysis algorithm. Hence there is no objective ground truth605
available when predicting these modes beyond the training606
period on which the data analysis was performed, and while607
in this case the NLSA algorithm was used, other data analy-608
sis methods such as EOFs would suffer the same drawback.609
We wish to test our prediction method on an observable that610
is objective, in the sense that it can be computed indepen-611
dently of the data analysis algorithm, for which we turn612
20 40 60 80 100 120 140 160 1801
1.5
2FEM!VARX model for low!frequency mode
clust
er
affili
atio
n
20 40 60 80 100 120 140 160 180!2
0
2
4
reco
nst
ruct
ed m
ode
0 10 20 30 40 50 60!2.5
!2
!1.5
!1
!0.5
0
0.5
1
1.5
2Sample Trajectories
time (months)
TP2K1M1M2
0 10 20 30 40 50 600
0.5
1
1.5
2RMSE
time (months)
P1P2K1M1M2PT
0 10 20 30 40 50 60!0.5
0
0.5
1PC
time (months)
P1P2K1M1M2PT
Fig. 8 Top left: Snapshot of the true HadISST f
SI
L1(NPGO) trajectory
(black) with stationary (blue) and non-stationary K = 2 (red) FEM-VARX model trajectories, along with the corresponding model affilia-tion function G (t) for non-stationary case. Top right: Sample trajecto-ries for various prediction methods–see Figure 7 for details of methods.Bottom panels: RMSE and PC as a function of lead time for variousprediction methods. The dashed black line is for potential predictiveskill of non-stationary FEM-VARX, where predictions were ran overthe training period using the known optimal model affiliation functionG (t).
to integrated sea ice extent anomalies, as defined in Equa-613
tion (10). We can clearly compute the time series of sea ice614
anomalies from the out-of-sample set directly (relative to the615
training set climatology), which will be our ground truth,616
and use the Laplacian pyramid approach to generate our617
out-of-sample extension predictions. This observable does618
not have a tight expansion in the eigenfunction basis, so619
the geometric harmonics method of extension will be ill-620
conditioned, and thus not considered. We note that in this621
approach, there are reconstruction errors at time lag t = 0, so622
at very short time scales we cannot outperform persistence.623
We consider a range of truncation levels for the number of624
ensemble members used, which are nearest neighbors to the625
out-of-sample data point, as determined by the kernel func-626
tion. Using all available neighbors will likely overly smooth627
and average out features in forward trajectories, while us-628
ing too few neighbors will place to much weight on particu-629
lar trajectories. Indeed we find good performance using 100630
(out of total possible 4791). The top left panel of Figure 9631
shows a snapshot of the true sea ice extent anomalies, re-632
spectively, together with a reconstructed out-of-sample ex-633
tension using the Laplacian pyramid. To be clear this is not634
a prediction trajectory, but rather each point in the out-of-635
sample extension is calculated using Equation (9); that is,636
each point is a time-lead t = 0 reconstruction.637
12 Darin Comeau et al.
time(months)20 40 60 80 100 120
km2
!10 5
-6
-5
-4
-3
-2
-1
0
1
2
3OSE NP Sea Ice Anomaly
TOSE
time (months)0 5 10 15
km2
!10 5
-5
-4
-3
-2
-1
0
1NP Sea Ice Anomaly Prediction
time (months)0 5 10 15
!10 5
0
1
2
3
4RMSE
nN=allnN=100nN=10P
time (months)0 5 10 15
0
0.2
0.4
0.6
0.8
1PC
nN=allnN=100nN=10P
Fig. 9 Top left: True sea ice cover anomalies plotted with the cor-responding Laplacian pyramid out-of-sample extention function. Theother panels show prediction results using Laplacian pyramids for totalNorth Pacific sea ice cover anomalies in CCSM4 data. The number ofnearest neighbors (nN) used to form the ensemble was varied, and wefind the best performance when the ensemble is restricted to the nearest100 neighbors, corresponding to about 2% of the total sample size.
The top right panel shows sample snapshots of predic-638
tion trajectories, restricting the ensemble size to the nearest639
10, 100, and then all nearest neighbors. Notice in particular640
predictions match the truth when the anomalies are close to641
0, but then may subsequently progress in the opposite sign642
as the truth. As our prediction metrics are averaged over ini-643
tial conditions spanning all months, the difficulty the pre-644
dictions have in projecting from a state of near 0 anomaly645
significantly hampers the ability for long-range predictabil-646
ity of this observable.647
In the bottom two panels of Figure 9 we have the aver-648
aged error metrics, and see year to year correlations mani-649
festing as a dip/bump in RMSE and PC in the persistence650
forecast that occurs after 12 months. After the first month651
lag time, the kernel ensemble analog forecasts overcome the652
reconstruction error and beat persistence in both RMSE and653
PC, and give about a 2 month increase in prediction skill (as654
measured by when the PC drops below 0.6) over persistence.655
We see the best performance restricting the ensemble size to656
100 nearest neighbors (about 2% of the total sample size)657
in both the RMSE and PC metrics, though this is marginal658
before the error metrics drop below the 0.6 threshold.659
Pushing the prediction strategy to an even more difficult660
problem, in Figure 10 we try to predict observational sea ice661
extent anomalies using CCSM4 model data as training data.662
In this scenario, without knowledge of the test data clima-663
tology, the observation sea ice extent anomalies are defined664
using the CCSM4 climatology. In the top panel of Figure 10665
we see the strong bias as a result, where the observational666
record has less sea ice than the CCSM model climatology,667
which has been taken from a pre-industrial control run. This668
strongly hampers the ability to accurately predict observa-669
tional sea ice extent anomalies using CCSM4 model ensem-670
ble analogs, and as a result the only predictive skill we see671
is from the annual cycle.672
20 40 60 80 100 120!6
!5
!4
!3
!2
!1
0
1x 10
7 OSE NP Sea Ice Anomaly
time(months)
km2
TOSE
0 5 10 15!4
!3
!2
!1
0
1x 10
7 NP Sea Ice Anomaly Prediction
time (months)
km2
TLP
0 5 10 150
0.5
1
1.5
2
2.5
3
3.5x 10
7 RMSE
time (months)
LPP
0 5 10 15!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1PC
time (months)
LPP
Fig. 10 Prediction results for total observed (HadISST data) North Pa-cific sea ice volume anomalies, using CCSM4 as training data. The icecover function is out-of-sample extended via Laplacian pyramid, using100 nearest neighbors.
5 Discussion673
We have examined a recently proposed prediction strategy674
employing a kernel ensemble analog forecasting scheme mak-675
ing use of out-of-sample extension techniques. These non-676
parametric, data-driven methods make no assumptions on677
the underlying governing dynamics or statistics. We have678
used these methods in conjunction with NLSA to extract679
low-frequency modes of variability from North Pacific SST680
and SIC data sets, both from models and observations. We681
find that for these low-frequency modes, the analog fore-682
casting performs at least as well, and in many cases bet-683
ter than, the simple constant persistence forecast. Predictive684
skill, as measured by PC exceeding 0.6, can be increased by685
up to 3 to 6 months for low-frequency modes of variability686
in the North Pacific. This is a strong advantage over tradi-687
tional parametric regression models, which were shown to688
fail to beat persistence.689
The kernel ensemble analog forecasting methods out-690
lined included two variations on the underlying out-of-sample691
extension scheme, each with its strengths and weaknesses.692
The geometric harmonics method, based on the Nystrom693
method, worked well for observables that are band-limited694
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 13
in the eigenfunction basis, in particular the eigenfunctions695
themselves. However for observables not easily expressed in696
such a basis, the Laplacian pyramid provides an alternative697
method based on a multiscale decomposition of the original698
observable.699
While the low-frequency eigenfunctions from NLSA were700
a natural preferred class of observables to target for predic-701
tion, we also studied the case of objective observables un-702
influenced by the data analysis algorithm. Motivated by the703
strong reemergence phenomena, we considered sea ice ex-704
tent anomalies as our target for prediction in the North Pa-705
cific. Using a shorter embedding window due to faster (sea-706
sonal) time scale dynamics, we obtain approximately a two707
month increase in predictive skill over the persistence fore-708
cast. It is also evident that when considering regional sea709
ice extent anomalies winds play a large role in moving ice710
into and out of the domain of interest, and as such additional711
consideration of the atmospheric component in the system712
could be included in the multivariate kernel function, despite713
having weaker low-frequency variability.714
An important consideration is that our prediction metrics715
are averaged over initial conditions ranging over all possible716
initial states of the system. As we saw clearly in the case717
of North Pacific sea ice volume anomalies, these prediction718
strategies can have difficulty with projecting from an initial719
state of quiessence, and can easily predict to the wrong sign720
of an active state, greatly hampering predictive skill. On the721
other hand we would expect predictive skill to be stronger722
for those initial states that begin in a strongly active state,723
or said differently, clearly in one climate regime, as oppose724
to in transition between the two. Future work will further725
explore conditional forecasting, where we either condition726
forecasts on the initial month, or the target month. Also ex-727
tending this analysis to the North Atlantic, another region of728
strong low-frequency variability, is a natural progression of729
this work.730
Acknowledgements The research of Andrew Majda and Dimitrios731
Giannakis is partially supported by ONR MURI grant 25-74200-F7112.732
Darin Comeau is supported as a postdoctoral fellow through this grant.733
The research of Dimitrios Giannakis is also partially supported by734
ONR DRI grant N00014-14-0150.735
References736
Alessandrini S, Delle Monache L, Sperati S, Nissen J (2015)737
A novel application of an analog ensemble for short-term738
wind power forecasting. Renewable Energy 76:768–781739
Alexander MA, Deser C, Timlin MS (1999) The reemer-740
gence of SST anomalies in the North Pacific Ocean. Jour-741
nal of climate 12(8):2419–2433742
Berry T, Cressman JR, Greguri?-Feren?ek Z, Sauer T (2013)743
Time-scale separation from diffusion-mapped delay co-744
ordinates. SIAM Journal on Applied Dynamical Systems745
12(2):618–649746
Blanchard-Wrigglesworth E, Bitz CM (2014) Characteris-747
tics of Arctic sea-ice thickness variability in GCMs. Jour-748
nal of Climate 27(21):8244–8258749
Blanchard-Wrigglesworth E, Armour KC, Bitz CM,750
DeWeaver E (2011a) Persistence and inherent predictabil-751
ity of Arctic sea ice in a GCM ensemble and observations.752
Journal of Climate 24(1):231–250753
Blanchard-Wrigglesworth E, Bitz C, Holland M (2011b)754
Influence of initial conditions and climate forcing on755
predicting Arctic sea ice. Geophysical Research Letters756
38(18)757
Branstator G, Teng H, Meehl GA, Kimoto M, Knight JR,758
Latif M, Rosati A (2012) Systematic estimates of initial-759
value decadal predictability for six AOGCMs. Journal of760
Climate 25(6):1827–1846761
Bushuk M, Giannakis D, Majda AJ (2014) Reemergence762
mechanisms for North Pacific sea ice revealed through763
nonlinear Laplacian spectral analysis*. Journal of Climate764
27(16):6265–6287765
Chen N, Majda AJ (2015a) Predicting the cloud patterns766
for the boreal summer intraseasonal oscillation through a767
low-order stochastic model. Mathematics of Climate and768
Weather Forecasting769
Chen N, Majda AJ (2015b) Predicting the real-time mul-770
tivariate Madden-Julian oscillation index through a low-771
order nonlinear stochastic model. Monthly Weather Re-772
view (2015)773
Chen N, Majda A, Giannakis D (2014) Predicting the cloud774
patterns of the Madden-Julian oscillation through a low-775
order nonlinear stochastic model. Geophysical Research776
Letters 41(15):5612–5619777
Coifman RR, Lafon S (2006a) Diffusion maps. Applied and778
computational harmonic analysis 21(1):5–30779
Coifman RR, Lafon S (2006b) Geometric harmonics: a780
novel tool for multiscale out-of-sample extension of em-781
pirical functions. Applied and Computational Harmonic782
Analysis 21(1):31–52783
Collins M (2002) Climate predictability on interannual to784
decadal time scales: the initial value problem. Climate785
Dynamics 19(8):671–692786
Collins WD, Bitz CM, Blackmon ML, Bonan GB, Brether-787
ton CS, Carton JA, Chang P, Doney SC, Hack JJ, Hen-788
derson TB, et al (2006) The community climate sys-789
tem model version 3 (CCSM3). Journal of Climate790
19(11):2122–2143791
Day J, Tietsche S, Hawkins E (2014) Pan-Arctic and re-792
gional sea ice predictability: Initialization month depen-793
dence. Journal of Climate 27(12):4371–4390794
Di Lorenzo E, Schneider N, Cobb K, Franks P, Chhak795
K, Miller A, McWilliams J, Bograd S, Arango H, Cur-796
chitser E, et al (2008) North Pacific Gyre Oscillation links797
14 Darin Comeau et al.
ocean climate and ecosystem change. Geophysical Re-798
search Letters 35(8)799
Drosdowsky W (1994) Analog (nonlinear) forecasts of the800
Southern Oscillation index time series. Weather and fore-801
casting 9(1):78–84802
Frankignoul C, Hasselmann K (1977) Stochastic climate803
models, part ii application to sea-surface temperature804
anomalies and thermocline variability. Tellus 29(4):289–805
305806
Franzke C, Crommelin D, Fischer A, Majda AJ (2008)807
A hidden Markov model perspective on regimes and808
metastability in atmospheric flows. Journal of Climate809
21(8):1740–1757810
Franzke C, Horenko I, Majda AJ, Klein R (2009) Sys-811
tematic metastable atmospheric regime identification812
in an AGCM. Journal of the Atmospheric Sciences813
66(7):1997–2012814
Gent PR, Danabasoglu G, Donner LJ, Holland MM, Hunke815
EC, Jayne SR, Lawrence DM, Neale RB, Rasch PJ,816
Vertenstein M, et al (2011) The community climate sys-817
tem model version 4. Journal of Climate 24(19):4973–818
4991819
Ghil M, Allen M, Dettinger M, Ide K, Kondrashov D, Mann820
M, Robertson AW, Saunders A, Tian Y, Varadi F, et al821
(2002) Advanced spectral methods for climatic time se-822
ries. Reviews of geophysics 40(1):3–1823
Giannakis D, Majda AJ (2012a) Comparing low-frequency824
and intermittent variability in comprehensive climate825
models through nonlinear Laplacian spectral analysis.826
Geophysical Research Letters 39(10)827
Giannakis D, Majda AJ (2012b) Limits of predictability828
in the North Pacific sector of a comprehensive climate829
model. Geophysical Research Letters 39(24)830
Giannakis D, Majda AJ (2012c) Nonlinear Laplacian spec-831
tral analysis for time series with intermittency and832
low-frequency variability. Proceedings of the National833
Academy of Sciences 109(7):2222–2227834
Giannakis D, Majda AJ (2013) Nonlinear Laplacian spectral835
analysis: capturing intermittent and low-frequency spa-836
tiotemporal patterns in high-dimensional data. Statistical837
Analysis and Data Mining 6(3):180–194838
Giannakis D, Majda AJ (2014) Data-driven methods for dy-839
namical systems: Quantifying predictability and extract-840
ing spatiotemporal patterns. Mathematical and Computa-841
tional Modeling: With Applications in Engineering and842
the Natural and Social Sciences p 288843
Hasselmann K (1976) Stochastic climate models part i. The-844
ory. Tellus 28(6):473–485845
Holland MM, Bitz CM, Hunke EC, Lipscomb WH,846
Schramm JL (2006) Influence of the sea ice thickness dis-847
tribution on polar climate in CCSM3. Journal of Climate848
19(11):2398–2414849
Holland MM, Bailey DA, Briegleb BP, Light B, Hunke E850
(2012) Improved sea ice shortwave radiation physics in851
ccsm4: the impact of melt ponds and aerosols on arctic852
sea ice*. Journal of Climate 25(5):1413–1430853
Horenko I (2010a) On clustering of non-stationary mete-854
orological time series. Dynamics of Atmospheres and855
Oceans 49(2):164–187856
Horenko I (2010b) On the identification of nonstation-857
ary factor models and their application to atmospheric858
data analysis. Journal of the Atmospheric Sciences859
67(5):1559–1574860
Horenko I (2011a) Nonstationarity in multifactor models861
of discrete jump processes, memory, and application to862
cloud modeling. Journal of the Atmospheric Sciences863
68(7):1493–1506864
Horenko I (2011b) On analysis of nonstationary categorical865
data time series: Dynamical dimension reduction, model866
selection, and applications to computational sociology.867
Multiscale Modeling & Simulation 9(4):1700–1726868
Hunke E, Lipscomb W (2008) CICE: The Los Alamos sea869
ice model, documentation and software, version 4.0, Los870
Alamos National Laboratory tech. rep. Tech. rep., LA-871
CC-06-012872
Lorenz EN (1969) Atmospheric predictability as revealed873
by naturally occurring analogues. Journal of the Atmo-874
spheric sciences 26(4):636–646875
Majda AJ, Harlim J (2013) Physics constrained nonlinear876
regression models for time series. Nonlinearity 26(1):201877
Majda AJ, Yuan Y (2012) Fundamental limitations of ad878
hoc linear and quadratic multi-level regression models879
for physical systems. Discrete and Continuous Dynami-880
cal Systems B 17(4):1333–1363881
Mantua NJ, Hare SR, Zhang Y, Wallace JM, Francis RC882
(1997) A Pacific interdecadal climate oscillation with im-883
pacts on salmon production. Bulletin of the american Me-884
teorological Society 78(6):1069–1079885
Metzner P, Putzig L, Horenko I (2012) Analysis of persistent886
nonstationary time series and applications. Communica-887
tions in Applied Mathematics and Computational Science888
7(2):175–229889
Newman M, Compo GP, Alexander MA (2003) ENSO-890
forced variability of the Pacific decadal oscillation. Jour-891
nal of Climate 16(23):3853–3857892
Nystrom EJ (1930) Uber die praktische auflosung von inte-893
gralgleichungen mit anwendungen auf randwertaufgaben.894
Acta Mathematica 54(1):185–204895
Overland J, Rodionov S, Minobe S, Bond N (2008) North896
Pacific regime shifts: Definitions, issues and recent tran-897
sitions. Progress in Oceanography 77(2):92–102898
Overland JE, Percival DB, Mofjeld HO (2006) Regime899
shifts and red noise in the North Pacific. Deep Sea Re-900
search Part I: Oceanographic Research Papers 53(4):582–901
588902
Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 15
Rabin N, Coifman RR (2012) Heterogeneous datasets rep-903
resentation and learning using diffusion maps and Lapla-904
cian pyramids. In: SDM, SIAM, pp 189–199905
Rayner N, Parker DE, Horton E, Folland C, Alexander L,906
Rowell D, Kent E, Kaplan A (2003) Global analyses of907
sea surface temperature, sea ice, and night marine air tem-908
perature since the late nineteenth century. Journal of Geo-909
physical Research: Atmospheres (1984–2012) 108(D14)910
Sauer T, Yorke JA, Casdagli M (1991) Embedology. Journal911
of statistical Physics 65(3-4):579–616912
Smith R, Jones P, Briegleb B, Bryan F, Danabasoglu G, Den-913
nis J, Dukowicz J, Eden C, Fox-Kemper B, Gent P, et al914
(2010) The Parallel Ocean Program (POP) reference man-915
ual: Ocean component of the Community Climate Sys-916
tem Model (CCSM). Los Alamos National Laboratory,917
LAUR-10-01853918
Takens F (1981) Detecting strange attractors in turbulence.919
Springer920
Tietsche S, Day J, Guemas V, Hurlin W, Keeley S, Matei921
D, Msadek R, Collins M, Hawkins E (2014) Seasonal922
to interannual Arctic sea ice predictability in current923
global climate models. Geophysical Research Letters924
41(3):1035–1043925
Xavier PK, Goswami BN (2007) An analog method for real-926
time forecasting of summer monsoon subseasonal vari-927
ability. Monthly Weather Review 135(12):4149–4160928
Zhao Z, Giannakis D (2014) Analog forecasting929
with dynamics-adapted kernels. arXiv preprint930
arXiv:14123831931