Author's personal copy - Max Planck Institute for ... · Author's personal copy Identifying multiple spatiotemporal patterns: A re ned view on terrestrial photosynthetic activity

Author's personal copy

Identifying multiple spatiotemporal patterns: A refined view on terrestrialphotosynthetic activity

Miguel D. Mahecha a,b,*, Lina M. Fürst a,c, Nadine Gobron d, Holger Lange e

a Biogeochemical Model-Data Integration Group, Max Planck Institute for Biogeochemistry, P.O. Box 10 01 64, 07701 Jena, Germanyb Institute for Atmospheric and Climate Science, ETH Zurich, Universitätsstrasse 16, 8092 Zurich, Switzerlandc Ecological Modelling, BayCEER, University of Bayreuth, 95440 Bayreuth, Germanyd European Commission, Joint Research Centre, Institute for Environment and Sustainability, TP 272, Via Enrico Fermi 2749, 21027 Ispra (VA), Italye Norsk Institutt for Skog og Landskap, P.O. Box 115, 1431 Ås, Norway

a r t i c l e i n f o

Article history:Received 10 September 2009Available online 17 July 2010Communicated by S. Aksoy

Keywords:Spatiotemporal dataNonlinear dimensionality reductionIsomapTime series analysisSingular spectrum analysisFAPAR

a b s t r a c t

Information retrieval from spatiotemporal data cubes is key to earth system sciences. Respective analysesneed to consider two fundamental issues: First, natural phenomena fluctuate on different time scales.Second, these characteristic temporal patterns induce multiple geographical gradients. Here we proposean integrated approach of subsignal extraction and dimensionality reduction to extract geographical gra-dients on multiple time scales. The approach is exemplified using global remote sensing estimates of pho-tosynthetic activity. A wide range of partly well interpretable gradients is retrieved. For instance, wellknown climate-induced anomalies in FAPAR over Africa and South America during the last severe ENSOevent are identified. Also, the precise geographical patterns of the annual–seasonal cycle and its phasingare isolated. Other features lead to new questions on the underlying environmental dynamics. Ourmethod can provide benchmarks for comparisons of data cubes, model runs, and thus be used as a basisfor sophisticated model performance evaluations.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

Data streams in earth system sciences often rely on ground-based measurements or remote sensing data of large geographicaland temporal coverage. Today we are confronted with an enor-mous amount of observations, or model data, to be analyzed andcompared. Moreover, high data acquisition rates increase theurgency for developing powerful tools to characterize the underly-ing patterns (Mjolsness and DeCoste, 2001). Specifically in earthsystem sciences, the problem of extracting relevant patterns fromsuch data cubes embraces two challenges: First, real world timeseries vary on multiple scales. They contain a wide range ofquasi-oscillatory fluctuations but also stochastic variations, andextreme events (Ghil et al., 2002; Stoy et al., 2009). Second, it mustbe considered that geographical patterns of similarities amongtime series vary in an inhomogeneous manner (Perry et al.,2002). This means that similarities of records may decay differ-ently in various directions; their geographical autocorrelation is alocal function of direction.

Formally, a spatiotemporal time series can be denoted as amatrix Y = {yt,l}, where t = 1, . . . ,N are the temporal replicates, andl = 1, . . . ,L is pointing on geographical locations lying on a 2D grid(Fig. 1a). Individual time series Y = {yt}, where t = 1, . . . ,N may con-sist of separable subsignals Yl ¼

Pf Xl;f , where f are the discrete fre-

quency classes. A conventional approach to handle Y would be tocircumvent the problem of geographical heterogeneity; forexample, through working with spatially aggregated time series.Alternatively, the temporal behavior might be simplified by inves-tigating maps of spectral powers (Barbosa et al., 2009), or visualiz-ing additional properties of Yl; for example, the scaling behavior(Zhan, 2008). Summarizing data in this way is helpful since theresulting patterns are easily accessible to subsequent interpreta-tions. Interesting scale specific features in the time–frequency do-main, however, might be lost.

The aim of this study is to uncover multiple geographical pat-terns at distinct temporal scales separately. Conceptually, thestudy proposes integrating time series analysis and dimensionalityreduction methods: Time series analysis is applied to extract sub-signals Xl,f from the observed time series Yl. Applying such tech-niques to Y leads to a series of new spatiotemporal data cubes Xf

(Fig. 1b–d). Due to expected regularities in the subsignals (e.g. qua-si-oscillatory patterns), we assume that the so-called ‘‘ambientspace” (the N dimensional space spanned by the temporal repli-cates) contains highly redundant information and justifies the

0167-8655/$ - see front matter � 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.patrec.2010.06.021

* Corresponding author at: Biogeochemical Model-Data Integration Group, MaxPlanck Institute for Biogeochemistry, P.O. Box 10 01 64, 07701 Jena, Germany.

E-mail addresses: [email protected], [email protected] (M.D. Mahecha).

Pattern Recognition Letters 31 (2010) 2309–2317

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec


applications of dimensionality reduction methods. The expectationis that reducing the redundancy in the temporal replicates of thesubsignal data cubes (Fig. 1b–d) could reveal independent repre-sentative sets of spatial patterns at distinct temporal scales. Suchan approach could, for example, inform the analyst if the dynamicsin a specific frequency band f is describable by a single geograph-ical gradient, or if multiple overlying patterns are required to ex-plain its spatial behavior.

The proposed methodological advance is relevant to numeroustasks in earth system sciences; for instance, the investigation of theglobal carbon cycle and its feedback with the climate system. Inthis area of application, we have to essentially deal with climatedata, remotely sensed (bio) physical variables, and global model re-sults that often sample large geographical areas. Analyzing thesedata, comparing different sources of observations, or contrastingsimulations and observations in terms of model benchmarking isa prerequisite for understanding the underlying biogeochemicalprocesses (Heimann et al., 2008). Relevant biogeochemical pro-cesses vary on multiple time scales: Ecosystem–atmosphere inter-actions follow meteorological conditions, annual–seasonalvariations, or decadal climate-induced patterns (Mahecha et al.,2007b; Stoy et al., 2009). Thus, an in-depth spatially explicit mul-ti-scale analysis of the system behavior is of fundamentalimportance.

For exemplifying our approach here, we explore a remote sens-ing dataset of the ‘‘Fraction of Absorbed Photosynthetic ActiveRadiation” (FAPAR). This state variable can serve as proxy forassessing the spatiotemporal variability of photosynthetic activity(Jung et al., 2008). It is therefore a central variable in diagnosticstudies of biospheric responses to climate forcing and otherenvironmental constraints or impacts. More importantly, differentFAPAR remote sensing products play an important role as input

variables for process-based diagnostic models that simulate theproductivity of terrestrial ecosystems (Seixas et al., 2009). Giventhat the spatiotemporal characteristics of FAPAR are largely unex-plored, we need a detailed characterization of the contained spatialgradients on multiple time scales in order to understand and scru-tinize the validity of the corresponding model simulations of globalCO2 fluxes in the near future.

2. Methods

2.1. Example data

The FAPAR is collected by the Sea-viewing Wide Field-of-ViewSensor (SeaWiFS NASA, Gobron et al., 2006, http://fapar.jrc.ec.eur-opa.eu). Here we explore the 10-day composites from 1998 to 2005globally provided at a 0.5� grid (for a comparison of these data withother FAPAR products, cf. McCallum et al., 2010). In certain geo-graphical areas, these FAPAR time series are fragmented due tocloudy conditions or snow coverage. The latter introduces recur-rent gaps affecting our analyses. Therefore, we exclude time serieswith P30% missing values (Fig. 2). The remaining observations un-dergo a double gap-filling procedure: Gaps coinciding with periodsof low land surface temperature (monthly Tair 6 0 �C) are filledwith white noise scaled to the range of the lower 20% quantilesof the observations. This simulates some natural backscatter, sincethe occurrence of relevant photosynthetic activity is implausible.The required ancillary information is derived from CRU climatedata (New et al., 2002) homogenized and prolongated to 2005, asdescribed by Österle et al. (2003). The remaining (generallyrandomly distributed) gaps are filled by an extended version ofSingular System Analysis (SSA, see below). The suitability of this

Fig. 1. A cutout of the original FAPAR data (first 2 years) aggregated to 1� geographical and monthly temporal resolution. The data cube can be decomposed, e.g. by SingularSystem Analysis (SSA) and reconstructed as sets of distinct frequency classes (lower row). The subsignals add up to the original data without loss of information, except fromthe mean value (the subsignals are centered to zero-mean).

2310 M.D. Mahecha et al. / Pattern Recognition Letters 31 (2010) 2309–2317


gap-filling procedure has been extensively tested for geoscientificrecords by Kondrashov and Ghil (2006), and we followed exactlytheir instructions for the univariate (pixel-by-pixel) gap-filling inprocessing the FAPAR data.1 The original data cube containsL = 57,500 geographical observations at N = 288 time steps: Thegeographical resolution is later reduced to a 1� grid (see Section2.2). However, the aggregation is only realized if at least two FA-PAR time series fall within a 1� grid cell, so that we finally workwith L = 11,324 spatial samples.

2.2. Time series analysis

Subsignal extraction is based on Singular System Analysis (SSA),originally developed by Broomhead and King (1986). Initially, onehas to define a length P for a window sliding along the time seriesYl. This leads to a trajectory matrix consisting of K = N � P + 1 time-lagged vectors of Yl. Decomposing the trajectory matrix intoorthogonal components (EOF’s) allows for reconstructing the timeseries partially (for details on algorithmic variants, see, e.g. Elsnerand Tsonis, 1996; Golyandina et al., 2001; Ghil et al., 2002). Eachsubsignal Xl,f is characterized by a clear periodicity, but might ex-hibit a certain degree of phase and amplitude modulation (‘‘qua-si-oscillatory subsignals”, Paluš and Novotná, 2006).

Two numerical problems affect SSA: First, unavoidable edgeeffects during the reconstruction process, and second, weak subsig-nal separability due to non-orthogonal reconstructed components.Edge effects are reduced here by artificially extending the time ser-ies to N + 2P through replicating the first and last years of the timeseries. But only the original points are later evaluated. Inaccuratesubsignal separation is slightly reduced by subjecting time seriesfrom an 0.5� � 0.5� geographical grid to SSA but aggregating the re-trieved subsignals to a lower resolution of 1� � 1�.

The choice of frequency classes is subjective and constrained bythe length of the FAPAR series, its temporal resolution, and thenotoriously prominent annual cycle. For simplicity, we use onlythree intuitive frequency classes: The annual–seasonal scale,framed by two bands of lower and higher frequencies, respectively.The annual–seasonal bin is defined by periods in the interval{0.375;1.25} years. In accordance with Stine et al. (2009) we ex-pect here considerable phase and amplitude modulations. Hence,a local variant of SSA is deployed for retrieving the annual–sea-sonal variability following the principle proposed by Yiou et al.(2000), summarized here in form of a ‘‘how to” recipe:

Step 1: Define a moving window of length W� N (here W = 36,corresponding to 1 year) on the time series Yl. The win-dows are centered on b ¼ 1

2 W;W;1 12 W; . . . ;N � 1

2 W .

Step 2: Apply SSA on each window separately. With the intentionof reducing edge effects as described above, and using theartificial extension (here, replicating the time series fivetimes), it is possible to work with an embedding dimen-sion of P = 1.25 years.

Step 3: We identify the annual–seasonal SSA component in eachwindow and ignore the rest, such that each Xl(b) containsthe annual–seasonal variability.

Step 4: Merge the locally reconstructed time series Xl. Since eachtime point is now covered by two local reconstructions,the overall annual–seasonal component is computed bythe weighted sum of the local reconstructions. Therein,each point in the local reconstruction receives a weightscaled to zero at its edges and one in the center (the tran-sition is chosen to be a sine function, but other weightingfunctions, e.g. a triangle, might work equally well). Theweighted local reconstructions are merged to a smoothreconstruction of the entire time series. This allowsextracting an accurate representation of the annual–seasonal periodicities.

We further use the annual–seasonal periodicity to producedeseasonalized time series. These data are then used to obtainlow frequency components using a maximum embedding win-dow P = 3 years with the conventional SSA. While these lowfrequency modes capture periodicities >1.25 years, residual fluc-tuations represent the high frequency components covering thevariability up to periods of 4.5 month. We can assume that, for in-stance stochastic and extreme events are contained here. Finally,we have three data cubes Xf, each belonging to a specific fre-quency class. Their sum is equivalent to a full reconstruction ofthe original data cube Y.

2.3. Dimensionality reduction

A fundamental property or most geophysical data sets is redun-dancy: The data points can be assumed to lie on a manifold of amuch smaller dimension than the ambient space. In the presentexample, the ambient space is spanned by the temporal replicatesN. Our primary goal is the construction of a low-dimensional repre-sentation of the observations, such that the inherent patterns of thedata cube are extracted and become accessible. Technically, this re-quires a mapping of the ambient space into a low-dimensionalspace H 2 Rm; m� N, which ideally is a suitable approximationof the underlying manifold. The process of dimensionality reduc-tion has to consider that the underlying manifold could be charac-terized by a complex topology. Non-Euclidean manifolds are likelyto occur in many environmental problems, in particular when theobservations are sampled along large geographical gradients.

Among methods for dimensionality reduction, Classical Multi-dimensional Scaling CMDS has played an important role. Theobjective is minimizing the cost

g ¼ ksðDðxÞÞ � sðDðhÞÞkL2 : ð1Þ

Here, s double centers the symmetric point-wise distance matrix,which is either estimated from the data space (D(x)) or in theembedding space (D(h)). The global g minimum is usually foundsolving an eigenvector–eigenvalue problem on D(x) (for an in-depthintroduction to CMDS, see Cox and Cox, 2001). The so-called ‘‘prin-cipal coordinates” or ‘‘dimensions” are then found by scaling theeigenvectors by the square-rooted corresponding (non-negative)eigenvalues. To express it in the context of the present application:Each grid cell in a subsignal data cube will be represented in thenew space, ideally spanned by very few dimensions. Success orfailure of CMDS depends on the chosen distance metric in D(x). For

[%]

−160 −120 −80 −40 0 40 80 120 160

−40

0

40

80

0

20

40

60

80

100

Fig. 2. Fraction of missing observations of the FAPAR time series at a 0.5�geographical grid. The 30% isoline illustrates the threshold criterion used in thepresent study (red line). (For interpretation of the references in colour in this figurelegend, the reader is referred to the web version of this article.)

1 The reasons to use the univariate gap-filling version of SSA instead of relying onethe multivariate extension is discussed in Section 4.2.

M.D. Mahecha et al. / Pattern Recognition Letters 31 (2010) 2309–2317 2311


non-flat manifolds, the often used Euclidean distances in D(x) areinappropriate since for an Euclidean distance metric dðxÞi;j underesti-mates the relationship between items (recognized early on by Wil-liamson (1978)). Tenenbaum et al. (2000) proposed estimating thedistances along nonlinear manifolds as follows: A number of near-est neighbors is assigned to each point in the ambient space. Foreach datum, neighboring points are found using a predefinedthreshold; for example, a radius e or a fixed number k of nearestpoints. Linking the captured points defines a connectivity graph,where the edge weights are set to the Euclidean distances. A noveldistance measure emerges by computing the shortest path alongthis undirected graph using Dijkstra’s algorithm (Dijkstra, 1959).The derived ‘‘geodesic-distances” replace D(x) in Eq. (1), and thus,form the basis for mapping the data cloud onto the low-dimen-sional embedding space. The approach is known as ‘‘Isometric Fea-ture Mapping” (Isomap).

Choosing an optimal k value is critical since it fundamentally af-fects the stability of the mapping (Balasubramanian et al., 2002).Here, we searched an optimal k value for each of the three analyzeddata cubes Yf, where the criterion was to maximize the fraction ofexplained variance using a minimum of embedding dimensions.The straight-forward approach is therefore to vary the k value froma minimum where the graph connectivity is achieved to the max-imum k = N � 1 (Mahecha et al., 2007a), where we have the con-ventional linear mapping (CMDS). Gámez et al. (2004) showedthat under the precondition of centered time series, linear CMDSbecomes equivalent to a principal component analysis (PCA). Inother words: It is possible that the results of a linear dimensional-ity reduction outperform the embedding based on a higher degreeof nonlinearity. The linear case of CMDS can occur in the presentsetting as a special case. However, we find that the data are bestcompressed choosing a nonlinear mapping with k = 55 for thelow frequency data cube, k = 600 when analyzing the annual–sea-sonal variability, and k = 35 for the high frequency modes.

2.4. Pattern extraction on multiple spatiotemporal scales

In the present study we strive to characterize the data manifoldrepresenting the photosynthetic activity on different time scales.We propose tackling this issue by combining the described meth-ods of subsignal separation and nonlinear dimensionality reduc-tion. In the following, we sequentially list all major steps in thechosen approach:

Step a: Data preprocessing excludes highly gap-infected time ser-ies (Fig. 2). The retained data points undergo a gap-fillingprocedure.

Step b: Perform a subsignal extraction on a pixel-by-pixel basis.Each time series Yl, at the geographical grid indexed byl = 1, . . . ,L is separated into subsignals. Each subsignal isrepresenting the temporal variability of a specific fre-quency class f (such that Yl ¼

Pf Xl;f ). This means that

the data cube Y is split into several data cubes Xf (seeFig. 1, and Section 2.2).

Step c: Subject each data cube Xf to nonlinear dimensionalityreduction via Isomap (as described in Section 2.3): Sinceeach of the leading dimensions has L entries, it representsa geographical gradient underlying the analyzed data cubeXf. We are now confronted with a series of time scale spe-cific geographical patterns, which might approximate therespective manifolds.

3. Results

In the following, we separately report on the results of thedimensionality reduction of the three selected frequency bands.

We put these results in the context of known environmental con-ditions, in order to assess the potential of our method to capturethe inherent structures in the spatiotemporal variability of FAPAR.

3.1. Geographical low frequency patterns

Dimensionality reduction on the low frequency modes of vari-ability reveals a series of coherent geographical patterns. The lead-ing five dimensions account together for >98% of the variance inthis specific data cube (Fig. 3, left column). However, the questionas to whether these low frequency patterns are interpretable canonly be answered in conjunction with ancillary analyses.

A comparison of the leading Isomap coordinates with trendestimates (conservatively quantified here as Sen slopes of the des-easonalized time series, Sen, 1968) reveals a reasonable relation-ship with the first dimension (Fig. 4a). The Sen slopes capturemonotonic changes in the time series that are either ‘‘real” trends,or periodicities beyond the detectability threshold. Recall, the signof Isomap/CMDS coordinates is a matter of convention. In this case,negative values of the first Isomap dimension (Fig. 3) correspond topositive trends (Fig. 4a). This first dimension is relatively domi-nant, explaining 59% of the variance in the low frequency datacube. Clearly, relevant parts of Australia, Eastern Africa, and somespots in South America undergo slow changes. To the best of ourknowledge, these negative trends are not extensively discussedin the literature and require further investigation (but see Angertet al., 2005 for the Northern hemisphere). Another salient patternis the upward trend in FAPAR identified in northern Mexico thatcoincides with favorable tendencies in the local hydrometeorolog-ical conditions (results not shown).

Given that the first Isomap dimension recovers the trend behav-ior, higher Isomap coordinates are expected to summarize the rem-nant patterns contained in the low frequency data cube. Therespective dimensions reveal spatially coherent structures. For in-stance, the second dimension spans a steep gradient from SouthAfrica to Kenya and Tanzania. At first glance, this structure corre-sponds to known impacts of the El Niño-Southern Oscillation(ENSO) conditions on the terrestrial biosphere (see, e.g. Kogan,2000, 2008). ENSO is a recurrent climate anomaly with irregularoscillatory behavior, causing rainfall anomalies especially in theSouth American and African tropical regions. It is hypothesizedthat ENSO generally induces deviations in the photosyntheticactivity from the long-term mean, and is the main mechanism be-hind year-to-year variations in the photosynthetic activity overAfrica (Weber et al., 2009). Indeed, Anyamba et al. (2002) showthat the particularly strong 1997/1998 ENSO event induceddrought stress in equatorial East Africa. They also report a paralleltemporal greening in Southern Africa. In line with their analysis,we show the monthly FAPAR anomalies during the 1998 ENSO(with respect to the following years) in Fig. 5 and contrast themwith the Isomap dimension 2. Although the patterns are not totallyidentical, their spatial occurrences suggest that we are seeing herethe same phenomenon through different eyes.

With the consistency of our results based on the dimensionalityreduction approach with known climate-related features, it is rea-sonable to assume that the higher Isomap dimensions derived fromlow frequency patterns also describe relevant fluctuations in FA-PAR. These features now await further explorations based in ancil-lary data or using process-based model analysis.

3.2. Geographical annual–seasonal patterns

The annual–seasonal periodicities are the most obvious pat-terns within the time series. Periods of one year dominate theoscillatory behavior of FAPAR globally (at 67% of the land surfacethe annual cycle is the dominant mode of variability). Exceptions



are desert regions that respond primarily to irregular rain pulsesand some tropical and subtropical regions where semiannual (sea-sonal) modes predominate. Indeed, the first Isomap dimension(Fig. 3, upper center) of the annual–seasonal subsignals shows highabsolute values where a strong annual cycle can be expected. Thenorthern and southern hemispheres are sorted at opposite ends of

the Isomap coordinates. Plotting this Isomap dimension against thespectral power in the corresponding frequency band reveals a U-shaped (second-order) relation (Fig. 4b). The color code corre-sponds to the latitude of the pixels, which shows that the shapefollows a clear geographical pattern. We can conclude from thisthat the first Isomap dimension captures the patterns of spectral

Dim

ensi

on 1

Low frequency modes

~59%−50

0

50

Annual−seasonal modes

~82%

High frequency modes

~47%

Dim

ensi

on 2

~84%−50

0

50

~94% ~66%

Dim

ensi

on 3

~93%−50

0

50

~97% ~72%

Dim

ensi

on 4

~96%−50

0

50

~98% ~74%

Dim

ensi

on 5

~98%

−100 −50 0 50 100 150

−50

0

50

min. max.

~98%

−100 −50 0 50 100 150

min. max.

~74%

−100 −50 0 50 100 150

min. max.

Fig. 3. The first five Isomap components of the FAPAR data derived from different time scales (recall Fig. 1). The first columns shows the Isomap dimensions of the lowfrequency components indicating a series of homogeneous geographical patterns each of which accounts for substantial fractions of variance (indicated in percentages persubplot). The central columns illustrate dominant geographical patterns in annual–seasonal modes, where most of the variance is captured by very few dimensions. TheIsomap dimension retrieved from high frequency components are displayed in the last column.

−1 −0.5 0 0.5 1−8

−6

−4

−2

0

2

4

6x 10−6

a)

Dim. 1; Low frequency modes

Sen

slop

e [d

−1]

−6 −4 −2 0 2 40

500

1000

1500

2000

2500

3000

b)

Dim. 1; Annual−seasonal modes

Annu

al−s

easo

nal p

ower

−60

−40

−20

0

20

40

60

Fig. 4. Seeking explanations for Isomap dimensions of Fig. 3: (a) The first Isomap dimension from low frequency FAPAR modes vs. significant Sen slope (identified using thenonparametric Mann–Kendall test, a = 0.05). The color code illustrates the geographical latitude of the pixels. (b) The first Isomap dimension of the annual–seasonal modesshows a second-order relationship with the spectral power in the frequency band, since Isomap differentiates the signal phasing. (For interpretation of the references incolour in this figure legend, the reader is referred to the web version of this article.)



dominance on the one hand, but further distinguishes the phasingof the signal according to the seasonality. This becomes also clearwhen tracing the geographical location of the few pixels of thenorthern latitudes that are on the right branch of the curve: Theseare all the summer dry—e.g. Mediterranean—ecosystems that haveshifted growing seasons.

One could claim that these findings are trivial. However, our re-sults retrieved from the annual–seasonal modes provide a proof-of-concept and show that the spatiotemporal patterns extractedwith the proposed methodology have clear environmental inter-pretations, especially the higher Isomap dimensions, which are lessintuitive and more difficult to explain. Dimension 3 could be re-lated to the gradients between areas where (due to two rain sea-sons per year) a double-peaked growing season is expected.However, the patterns are relatively unclear and explain only min-or fractions of the variability.

3.3. Geographical high frequency patterns

Amongst the evaluated time scales, the high frequency patternsappeared to comprise the most complex spatial structures. ManyIsomap dimensions are required to achieve high values of ex-plained variance (Fig. 3). This can be explained from an environ-mental perspective: The high frequency patterns in FAPARcomprise here periods up to�4.5 month. Therefore, meteorologicaleffects and associated (possibly time-delayed) responses of thevegetation are revealed here. Compared with annual cycles andlow frequency modes, the homogeneous patterns are generally oflimited geographical extent. Indeed, higher Isomap dimensionsincreasingly uncover locally differentiated structures. And we as-sume that these modes (retrieved as residuals from the oscillatorycomponents) represent geographically local patterns of stochasticvariations such as synoptic meteorological variability and extremeevents.

4. Discussion

The results reveal a series of interesting environmental featuresthat deserve an in-depth discussion, which is, however, beyond thescope of this paper. Therefore, we focus on the methodological is-sues, illustrating advantages and limitations of the proposed dataanalysis strategy.

4.1. Conceptual remarks

The combined application of advanced time series and dimen-sionality reduction techniques permits extracting coherent pat-terns on multiple spatiotemporal scales. The spatial coherency isinteresting considering that the dimensionality reduction method

has now information about the geographical locations. But is ourcapacity to understand the data cube substantially improved com-pared with a dimensionality reduction exercise on the undecom-posed time series? If the latter reveals similar spatiotemporalstructures, clear relationships to the spatiotemporal modes willbe found. However, a correlation analysis reveals no relationshipbetween low or high frequency patterns and the Isomap dimen-sions of the undecomposed time series (Fig. 6). The figure showsonly that the first four Isomap dimensions retrieved from theraw (but centered) FAPAR observations correspond to the respec-tive Isomap dimensions of the annual–seasonal subsignals. Thisis explicable by the fact that the annual–seasonal variability dom-inates the temporal dynamics in FAPAR. The failure of the Isomapembedding of the undecomposed data to detect equivalent struc-tures than extracted from the low and high frequency data cubeshighlights the advantage of an integrative spatiotemporal analysis:Although low frequency variability in FAPAR is only of minor quan-titative relevance, they are not overlooked in the proposed ap-proach. This is a crucial asset, because long-term changes inbiospheric responses are key in current climate change assess-ments (Piao et al., 2008). Without the preceding subsignal separa-tion, geographical patterns assignable to different temporal scalesare largely ignored.

4.2. Subsignal extraction

Our analysis applies the univariate SSA on a pixel-by-pixel basisinstead of a multivariate SSA (MSSA, Ghil et al., 2002). The lattercould have been an alternative for both the gap-filling procedureand the separation of subsignals (Kondrashov and Ghil, 2006).However, we consider the univariate time series analysis a neces-sary precondition to guarantee that no spatial correlation structureis artificially induced to the spatial gradients retrieved in the laststep of the analysis (see Section 2.4). Rather, spatially homoge-neous structures should emerge from the dimensionalityreduction.

To some extent, our results depended on the method selectedfor separating subsignals from univariate time series, which hasbeen a matter of debate for decades (Ghil et al., 2002). Instead ofusing SSA, other methods, including ‘‘Discrete Wavelet Transforms,DWT” (Torrence and Compo, 1998) or ‘‘Empirical Mode Decompo-sition, EMD” (Huang et al., 1998; Huang and Wu, 2008) can be usedto split the original data cube into different temporal scales. Allmentioned methods are improvements compared with the classi-cal Fourier decomposition, since they do not assume a fixed super-position of weighted sines and cosines. Instead, they allow for acertain degree of phase and amplitude modulation of the subsig-nals. One has to be aware of some differences induced by the tech-nicalities of the methods. For example, DWT and EMD may lead tohigher time localization, but SSA is expected to achieve a more

Jan. 1998 anomaly

−80 −60 −40 −20 0 20 40−60

−40

−20

0

20

40Feb. 1998 anomaly

−80 −60 −40 −20 0 20 40

Mar. 1998 anomaly

−80 −60 −40 −20 0 20 40

Apr. 1998 anomaly

−80 −60 −40 −20 0 20 40

Isomap Dimension 2(Low freq. modes)

−80 −60 −40 −20 0 20 40

Fig. 5. The monthly anomalies in FAPAR during the severe 1998 ENSO event, shown from January to April. Red colors indicate a decrease in FAPAR, a reduction inphotosynthetic activity, and blue an temporal ‘‘greening”. The second Isomap dimension of the low frequency modes in FAPAR (see also Fig. 3) seems to capture a similarpattern, indicating that here we are confronted with the climate-induced interannual variability in FAPAR. (For interpretation of the references in colour in this figure legend,the reader is referred to the web version of this article.)



precise signal separation (due to orthogonality constraints in thedecomposition).

In Fig. 7 we show the results of such a comparison exercise.Interestingly, very high agreement rates are found among thelow frequency modes of SSA and EMD. This is surprising, sincewe expected that the short observation period would turn theextraction of low frequency modes into the most vulnerable stepof the analysis. Apparently, the relatively short embeddingdimension P did not affect the retrieval of low frequency modesvia SSA. Also, the fact that the variability introduced by thesecomponents is generally low compared with annual–seasonal

and high frequency modes is not a favorable prerequisite. Theextraction of the annual–seasonal and high frequency modes re-veals severe disagreements between SSA and EMD in some geo-graphical areas. For instance, the estimates of the annual–seasonal modes disagree particularly in areas where agriculturalland-use regimes dominate the vegetation (clearly revealed interms of the root mean squared error RMSE, and median absolutedeviation MAD). This is reasonable, since agriculture often leadsto abrupt changes in FAPAR; e.g. due to harvest, which is very dif-ficult to represent by smooth subsignals. Possibly, the applied lo-cal SSA did not improve the locality of the decomposition

Isomap DimensionLow freq. modes

Isom

ap D

imen

sion

Und

ecom

pose

d FA

PAR

dat

a

1 2 3 4 5 6 7 8 9 10

123456789

10

Isomap DimensionAnnual−seasonal modes

1 2 3 4 5 6 7 8 9 10Isomap DimensionHigh freq. modes

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

Fig. 6. Correlation of Isomap coordinates derived from the original FAPAR time series (undecomposed but centered to zero-mean) with the Isomap coordinates summarizingthe three subsignal classes. Since eigenvectors are of arbitrary sign, absolute values of the correlations are shown. The patterns contained in the low and high frequencycomponents, show no relation with the corresponding analysis of the raw data. The spatial gradients underlying these frequency classes are totally overlooked in anconventional analysis of the raw data. For the annual–seasonal components, instead, strong relationships are found. The reason is that these modes generally account for thelargest fraction of variance in the time series.

−50

0

50

Low frequency modes

slope

Annual−seasonal modes

slope

High frequency modes

slope0.2950.5890.8841.179

−50

0

50

R R R0.2460.4930.7390.986

−50

0

50

RMSE RMSE RMSE0.0060.0110.0170.023

−50

0

50

MAD MAD MAD0.0050.0090.0140.018

−100 −50 0 50 100 150−50

0

50

MI

−100 −50 0 50 100 150

MI

−100 −50 0 50 100 150

MI0.1550.3100.4660.621

Fig. 7. A comparison of the subsignal separation via SSA (Singular System Analysis) and EMD (Empirical Model Decomposition) in the three frequency classes. Here, weassume that SSA is the target and use a series of performance measures to asses the ability of EMD to recover equivalent patters: The slope, the correlation coefficient (R), theroot mean squared error (RMSE), the median absolute error (MAE), and the mutual information (MI), are estimated on a pixel-by-pixel basis.



sufficiently to capture such phenomena. Less pronounced prob-lems also affect the extraction of the high frequency variability,where the interpretation of the spatial inconsistencies amongthe methods is, however, less clear. The general problem in thiscomparison is the lack of an unbiased reference for validationpurposes.

Obviously, the choice of subsignal extraction method deserves acautionary reflection. The analyst has to be aware of the uncer-tainty in subsignal separation, which is unavoidable whatevermethod is chosen. Studies of contiguous spatiotemporal fields,however, have the advantage that spatial replicates reduce thisproblem: Neighboring grid cells act as statistical replicates. Never-theless, bootstrapping (Golyandina et al., 2001) or more sophisti-cated surrogate techniques (Venema et al., 2006) could be usedfor quantifying the subsignal extraction uncertainty (Mahechaet al., 2010).

4.3. Dimensionality reduction

The obvious rationale of innovations in the field of nonlineardimensionality reduction is that corresponding linear techniquesare generally unsuitable for real world applications (e.g. Schölkopfet al., 1998; Tenenbaum et al., 2000; Mjolsness and DeCoste, 2001).Despite this insight, linear methods remain a standard in manyareas of application; for example, in climatology or environmentalsciences (but see Gámez et al., 2004; Hsieh, 2004; Mahecha andSchmidtlein, 2008). One reason might be that earth system sci-ences are confronted with data volumes of several tens of thou-sands of samples that render nonlinear algorithms often notfeasible.

Nevertheless, nonlinear dimensionality reduction providesimportant perspectives. But note that comparable methodologicalreflections made on the time series analysis tools apply to Isomapas well: It is only one out of a wide range of relatively novel tech-niques, each of which has specific strengths—and typical pitfalls(for an overview see van der Maaten et al., 2009). Also in this field,a long-lasting debate on optimal algorithms is far from consensus.Even for the individual algorithms a series of slightly improvedvariants may exist (in case of Isomap improvements generally fo-cus on the selection of k-NNs, e.g. Mekuz et al., 2006; Meng et al.,2008; Wen et al., 2008), each of which, however, introduces newand partly disputable assumptions to the analysis.

5. Conclusion and outlook

Environmental research requires a detailed understanding ofthe spatiotemporal variability in the investigated biological or geo-physical processes. The relevant patterns need to be identified andextracted from the corresponding observational or model datacubes. This paper illustrates how the combination of time seriesand dimensionality reduction techniques enables the analyst to re-cover multiple spatiotemporal scales from large data cubes.

In the example application exploring the FAPAR data from1998 to 2005, the identified geographical patterns can be partlytraced back to well known environmental conditions, such asthe 1997/1998 ENSO event. Also, geographical long-term changesin the photosynthetic activities are isolated, and some other inter-esting features emerged that call for in-depth environmentalinterpretations. This is part of the analysis strategy: Retrievedpatterns trigger the generation of well-defined hypotheses await-ing further explanations, especially in relation to ancillary data.Moreover, we expect that the retrieval of multiple spatiotemporalpatterns leads to refined comparisons of remote sensing products,or climate data from different sources. The proposed technique isalso of interest in the context of model ensemble analysis, or

model–data comparisons. For the latter, spatiotemporal dimen-sions found in monitoring data can serve as benchmarks in modelperformance evaluations. As a final remark, we note that this let-ter is also an attempt to illustrate well-defined demands of earthsystem sciences to future developments in pattern recognitionand machine learning.

Acknowledgments

We gratefully acknowledge M. Reichstein and S.I. Seneviratnefor their insightful advice throughout the study. The authors alsothank M. Jung, U. Weber, and three anonymous reviewers for veryuseful comments. LMF and MDM thank the Max Planck Society forsupporting the Max Planck research group on ‘‘BiogeochemicalModel-Data Integration”. This study emerged during the Workshop‘‘Novel data mining strategies for exploring biogeochemical cyclesand biosphere–atmosphere interactions”, Jena summer 2009,funded by the Max Planck Society, and was further developed inthe CARBO—Extreme project funded by the European Commis-sion (FP7-ENV-2008-1-226701).

References

Angert, A., Biraud, S., Bonfils, C., Henning, C.C., Buermann, W., Pinzon, J.E., Tucker,C.J., Fung, I., 2005. Drier summers cancel out the CO2 uptake enhancementinduced by warmer springs. Proc. Natl. Acad. Sci. 102, 823–827.

Anyamba, A., Tucker, C.J., Mahoney, R., 2002. From El Niño to La Niña: Vegetationresponse patterns over East and Southern Africa during the 1997–2000 period.J. Climate 15, 3096–3103.

Balasubramanian, M., Schwartz, E.L., Tenenbaum, J.B., de Silva, V., Langford, J., 2002.The Isomap algorithm and topological stability. Science 295, 7.

Barbosa, S.M., Silva, M.E., Fernandes, M.J., 2009. Multi-scale variability patterns inNCEP/NCAR reanalysis sea-level pressure. Theoret. Appl. Climatol. 96, 319–326.

Broomhead, D.S., King, G.P., 1986. Extracting qualitative dynamics fromexperimental data. Physica D 20, 217–236.

Cox, T.F., Cox, M.A.A., 2001. Multidimensional Scaling, vol. 45. Chapman & Hall, BocaRaton.

Dijkstra, E.W., 1959. A note on two problems in connexion with graphs. Numer.Math. 1, 269–271.

Elsner, J.B., Tsonis, A.A., 1996. Singular Spectrum Analysis. A New Tool in TimeSeries Analysis. Plenum Press, New York.

Gámez, A.J., Zhou, C.S., Kurths, J., 2004. Nonlinear dimensionality reduction inclimate data. Nonlinear Process. Geophys. 11, 393–398.

Ghil, M., Allen, M.R., Dettinger, M.D., Ide, K., Kondrashov, D., Mann, M.E., Robertson,A.W., Saunders, A., Tian, Y., Varadi, F., Yiou, P., 2002. Advanced spectral methodsfor climatic time series. Rev. Geophys. 40, 1003.

Gobron, N., Pinty, B., Aussedat, O., Chen, J., Cohen, W.B., Fensholt, R., Gond, V.,Hummerich, K.F., Lavergne, T., Mélin, F., Privette, J.L., Sandholt, I., Taberner, M.,Turner, D.P., Verstraete, M.M., Widlowski, J.-L., 2006. Evaluation of fraction ofabsorbed photosynthetically active radiation products for different canopyradiation transfer regimes: Methodology and results using Joint ResearchCenter products derived from SeaWiFS against ground-based estimations. J.Geophys. Res. 111, D13110.

Golyandina, N., Nekrutkin, V., Zhigljavsky, A., 2001. Analysis of Time SeriesStructure: SSA and Related Techniques. Monographs on Statistics and AppliedProbability No. 90. Chapman & Hall/CRC, Boca Raton.

Heimann, M., Reichstein, M., 2008. Terrestrial ecosystem carbon dynamics andclimate feedbacks. Nature 451 (January), 289–292.

Hsieh, W., 2004. Nonlinear multivariate and time series analysis by neural networkmethods. Rev. Geophys. 42, 1–25.

Huang, N.E., Shen, Z., Long, S.R., Wu, M.L., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C.,Liu, H.H., 1998. The empirical mode decomposition and Hilbert spectrum fornonlinear and nonstationary time series analysis. Proc. Roy. Soc. London A 454,903–995.

Huang, N.E., Wu, Z., 2008. A review on Hilbert–Huang transform: Method and itsapplications to geophysical studies. Rev. Geophys. 6, RG2006.

Jung, M., Verstraete, M., Gobron, N., Reichstein, M., Papale, D., Bondeau, A.,Robustelli, M., Pinty, B., 2008. Diagnostic assessment of European grossprimary production. Global Change Biol. 14, 2349–2364.

Kogan, F.A., 2000. Satellite-observed sensitivity of world land ecosystems to el niño/la niña. Remote Sens. Environ. 74, 445–462.

Kondrashov, D., Ghil, M., 2006. Spatio-temporal filling of missing data points ingeophysical data sets. Nonlinear Process. Geophys. 13, 151–159.

Mahecha, M.D., Martínez, A., Lischeid, G., Beck, E., 2007a. Nonlinear dimensionalityreduction: Alternative ordination approaches for extracting and visualizingbiodiversity patterns in tropical montane forest vegetation data. Ecol. Inform. 2,138–149.

Mahecha, M.D., Reichstein, M., Jung, M., Seneviratne, S.I., Zaehle, S., Beer, C.,Braakhekke, M.C., Carvalhais, N., Lange, H., Le Maire, G., Moors, E., 2010.



Comparing observations and process-based simulations of biosphere–atmosphere exchanges on multiple time scales. J. Geophys. Res.—Biogeosci.115, G02003, doi:10.1029/2009JG001016.

Mahecha, M.D., Reichstein, M., Lange, H., Carvalhais, N., Bernhofer, C., Grünwald, T.,Papale, D., Seufert, G., 2007b. Characterizing ecosystem–atmosphereinteractions from short to interannual time scales. Biogeosciences 4, 743–758.

Mahecha, M.D., Schmidtlein, S., 2008. Revealing biogeographical patterns bynonlinear ordinations and derived anisotropic spatial filters. Global Ecol.Biogeogr. 17, 284–296.

McCallum, I., Wagner, W., Schmullius, C., Shvidenko, A., Obersteiner, M., Fritza, S.,Nilsson, S., 2010. Comparison of four global fapar datasets over northern eurasiafor the year 2000. Remote Sens. Environ. 114, 941–949.

Mekuz, N., Tsotsos, J., 2006. Parameterless Isomap with adaptive neighborhoodselection. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (Eds.), DAGM-Symposium, Lecture Notes in Computer Science, vol. 4174. Springer, pp. 364–373.

Meng, D., Leung, Y., Xu, Z., Fung, T., Zhan, Q., 2008. Improving geodesic distanceestimation based on locally linear assumption. Pattern Recognition Lett. 29,862–870.

Mjolsness, E., DeCoste, D., 2001. Machine learning for science: State of the art andfuture prospects. Science 293, 2051–2055.

New, M., Lister, D., Hulme, M., Makin, I., 2002. A high resolution data set of surfaceclimate over global land areas. Climate Res. 21, 1–25.

Österle, H., Gerstengarbe, F.W., Werner, P.C., 2003. Homogenisierung undAktualisierung des Klimadatensatzes der Climate Research Unit of East Anglia,Norwich (in German). Terra Nostra 6, 326–329.

Paluš, M., Novotná, D., 2006. Quasi-biennial oscillations extracted from the monthlyNAO index and temperature records are phase-synchronized. NonlinearProcess. Geophys. 13, 287–296.

Perry, J.N., Liebhold, A.M., Rosenberg, M.S., Dungan, J., Miriti, M., Citron-Pousty, S.,2002. Illustrations and guidelines for selecting statistical methods forquantifying spatial pattern in ecological data. Ecography 25, 578–600.

Piao, S.L., Ciais, P., Friedlingstein, P., Peylin, P., Reichstein, M., Luyssaert, S., Margolis,H., Fang, J.Y., Barr, A., Chen, A.P., Grelle, A., Hollinger, D.Y., Laurila, T., Lindroth,A., Richardson, A.D., Vesala, T., 2008. Net carbon dioxide losses of northernecosystems in response to autumn warming. Nature 451, 49–52.

Potter, C., Boriah, S., Steinbach, M., Kumar, V., Klooster, S., 2008. Terrestrialvegetation dynamics and global climate controls. Climate Dynam. 31, 67–78.

Schölkopf, B., Smola, A.J., Müuller, K.-R., 1998. Nonlinear component analysis as akernel eigenvalue problem. Neural Comput. 10, 1299–1319.

Seixas, J., Carvalhais, N., Nunes, C., Benali, A., 2009. Comparative analysis of MODIS-FAPAR and MERIS-MGVI datasets: Potential impacts on ecosystem modeling.Remote Sens. Environ. 113, 2547–2559.

Sen, P.K., 1968. Estimates of the regression coefficient based on Kendall’s tau. J.Amer. Statist. Assoc. 63, 1379–1389.

Stine, A.R., Huybers, P., Fung, I.Y., 2009. Changes in the phase of the annual cycle ofsurface temperature. Nature 457, 435–440.

Stoy, P.C., Richardson, A.D., Baldocchi, D.D., Katul, G.G., Stanovick, J., Mahecha, M.D.,Reichstein, M., Detto, M., Law, B.E., Wohlfahrt, G., Arriga, N., Campos, J.,McCaughey, J.H., Montagnani, L., Paw U, K.T., Sevanto, S., Williams, M., 2009.Biosphere–atmosphere exchange of CO2 in relation to climate: A cross-biomeanalysis across multiple time scales. Biogeosciences 6 (10), 2297–2312.

Tenenbaum, J.B., de Silva, V., Langford, J.C., 2000. A global geometric framework fornonlinear dimensionality reduction. Science 290, 2319–2323.

Torrence, C., Compo, G.P., 1998. A practical guide to wavelet analysis. Bull. Amer.Meteorol. Soc. 79, 61–79.

van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J., 2009. Dimensionalityreduction: A comparative review. J. Machine Learn. Res. 10, 1–41.

Venema, V., Bachner, S., Rust, H.W., Simmer, C., 2006. Statistical characteristics ofsurrogate data based on geophysical measurements. Nonlinear Process.Geophys. 13 (4), 449–466.

Weber, U., Jung, M., Reichstein, M., Beer, C., Braakhekke, M.C., Lehsten, V., Ghent, D.,Kaduk, J., Viovy, N., Ciais, P., Gobron, N., Rdenbeck, C., 2009. The interannualvariability of africa’s ecosystem productivity: A multi-model analysis.Biogeosciences 6 (2), 285–295.

Wen, G., Jiang, L., Wen, J., 2008. Using locally estimated geodesic distance tooptimize neighborhood graph for isometric data embedding. PatternRecognition 41, 2226–2236.

Williamson, M., 1978. The ordination of incidence data. J. Ecol. 66, 911–920.Yiou, P., Sornette, D., Ghil, M., 2000. Data-adaptive wavelets and multi-scale

singular-spectrum analysis. Physica D 142, 254–290.Zhan, H., 2008. Scaling in global ocean chlorophyll fluctuations. Geophys. Res. Lett.

35, L01606.


Documents

Author's personal copy - Max Planck Institute for ... · Author's personal copy Identifying multiple spatiotemporal patterns: A re ned view on terrestrial photosynthetic activity