44
DDS 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

Embed Size (px)

Citation preview

Page 1: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

What is the correct numberof break points

hidden in a climate record?

Page 2: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Defining Breaks

Relocations of climate stations or changes in intrumention lead to slightly different measurements in different periods.

These small, but abrupt changes are called breaks.

Breaks become visible, when differences to neighbour stations are considered.

The reason is that the dominating natural variability is filtered out in this way.

Page 3: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Internal and External Variance

Consider the differences of one station compared to a reference. (Kriged ensemble of surrounding stations)

Breaks are defined by abrupt changes in the station-reference time series.

Internal variancewithin the subperiods

External variancebetween the means of different

subperiods

Criterion:Maximum external variance attained bya minimum number of breaks

Page 4: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Decomposition of Variance

n total number of yearsN subperiodsni years within a subperiod

The sum of external and internal variance is constant.

Page 5: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Two questions

Titel of this talk asks: How many breaks?

Where are they situated?

Testing of all permutions is not feasible.

The best solution for a fixed number of breaks can be found by Dynamical Programming

131010987654321

90919293949596979899

10

991

k

n

Page 6: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Dynamical Programming (1)

Find the optimum positions for a fixed number of breaks.

Consider not only the complete time series, but all possible truncated variants.

Page 7: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Dynamical Programming (2)

Find the optimum positions for a fixed number of breaks.

Consider not only the complete time series, but all possible truncated variants.

Find the best first break by simply testing all permutions.

Page 8: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Dynamical Programming (3)

Find the optimum positions for a fixed number of breaks.

Consider not only the complete time series, but all possible truncated variants.

Find the best first break by simply testing all permutions.

Fill up all truncated variants. The internal variance consists now of two parts: that of the truncated variant plus that of the rest.

Important: Variances are additive

Page 9: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Dynamical Programming (4)

Find the optimum positions for a fixed number of breaks.

Consider not only the complete time series, but all possible truncated variants.

Find the best first break by simply testing all permutions.

Fill up all truncated variants. The internal variance consists of two parts: that of the truncated variant plus that of the rest.

Search the minimum out of n.

Page 10: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Dynamical Programming (5)

The 2-breaks optimum for the full length is found.

To begin the search for 3 breaks, we need as before the previous solutions for all, also shorter length.

This needs n2/2 searches, which is for larger numbers of breaks k much less than all permutations (n over k).

Page 11: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Position & Number

Solved:

The optimum positions for a fixed number of breaks are known by Dynamical Programming.

Left:

Find the optimum number of breaks.

The external variance increase in any case with increasing number of breaks.

Use as reference the behaviour of a random time series.

Page 12: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Segment averages

with stddev = 1

Segment averages xi scatter randomly

mean : 0

stddev: 1/

Because any deviation from zero can beseen as inaccuracy due to the limited number of members.

in

Page 13: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

External Variance

The external varianceis equal to the mean square sumof a random standard normal distributed variable.

Weighted measure for thevariability of the subperiods‘means

Page 14: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

2-distribution

n: Length of time series (Number of years)k: Number of breaksN = k+1: Number of subsegments[ ]: Mean over several break position permutations

[varext] = (N-1)/n = k/nIn average, the external variance increases linearly with k.However, we consider the best member as found by DP.

varext ~ N2 The external variance is chi2-distributed.

Def.:

Take N values out of N (0,1), square and add them up.

By repeating a N2-distribution is obtained.

Page 15: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

21-years random data (1)

1000 random time series are created.

Only 21-years long, so that explicite tests of all permutations are possible.

The mean increases linearly.

However, the maximum is relevant

(the best solution as found by DP)

Can we describe this function?

First guess: 4*11 kv

Page 16: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

21-years random data (2)

Above, we expected the datafor a fixed number of breaksbeing chi2-distributed.

Page 17: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

The random data does not fit exactly to a chi2-distribution.

The reason is that chi2 has no upper bounds.

But varext cannot exceed 1.

A kind of confined chi2 is the beta distribution.

From 2 to distribution

n = 21 yearsk = 7 breaks

data

Page 18: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

From 2 to distribution

n = 21 yearsk = 7 breaks

data

X ~ 2(a) and Y ~ 2(b)

X / (X+Y) ~ (a/2, b/2)

If we normalize a chi2-distributed variable by the sum of itself and another chi2-distributed variable, the result will be -distributed.

The -distribution fits well to the data and is the theoretical distribution for the external variance of all break position permutations.

Page 19: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

From 2 to distribution

11

15

17

7

)(

)()(),(

ba

babaB

2

1,2

1)(

12

112

knkB

vvvp

knk

with

We are interested in the best solution,with the highest external variance, as provided by DP.

We need the exceeding probability forhigh varext

Page 20: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Incomplete Beta Function

2

1,2

1)(

12

112

knkB

vvvp

knk

External variance v is -distributedand depends on n (years) and k (breaks):

The exceeding probability P gives thebest (maximum) solution for v

Incomplete Beta Function

1

0

1)(i

l

lml vvl

mvP

Solvable for even k and odd n:

2

ki

2

3n

m

v

pdvvP0

1)(

Page 21: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Example 21 years, 4 breaks

1

0

1)(i

l

lml vvl

mvP

k = 4 i = 2n = 21 m = 9

89 191)( vvvvP

2

ki

2

3n

m

Page 22: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Theory and Data

89 191)( vvvvP

Theory (Curve):

Random data (hached) fits well.

Page 23: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Nominal Combination Number

48454

201

k

n

For n = 21 and k = 4 there are

break combinations.

If they all were independent wecould read the maximum externalvariance at (4845)-1 ≈ 0.0002 being 0.7350

However, we suspect that thebreak combinations are notindependent. And we know thecorrect value of varext.

Page 24: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Effective and Nominal

Remember: varext= 0.5876 for k=4

The reverse reading leads to an 23 times higher exceeding probability.

This shows that the break permutationsare strongly dependent and the effectivenumber of combinations is smaller than the nominal.

However, the theorectical function is correct.

Page 25: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

From 21 years to 101 years

As we now know the theoretical function, we quit the explicit check by random data.

And skip from unrealistic short time series (n=21) to more realistic (n=101).

Again the numerical values of the external variance is known and we can conclude the effective combination numbers.

Can we give a formula for in order to derive v(k) ?

220

breaks

dk

dv

Page 26: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

dv/dk sketch

Increasing the break number from k to k+1 has two consequences:

1. The probability function changes.

2. The number combinations increase.

Both increase the external variance.

k breaks

k+1 breaks

Page 27: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Using the Slope

P(v) is a complicated function and hard to invert into v(P).

Thus, dv is concluded from dP / slope.

We just derived P(v) by integrating p(v), so that the slope p(v) is known.

k breaks

k+1 breaks

1

0

1)(i

l

lml vvl

mvP

Page 28: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

The Slope

)(

)())(ln(

vP

vpvP

dv

d

1

0

1

1

111

))(ln(i

l

lml

imi

vvl

m

i

mimvv

vPdv

d

11

1

11

111

))(ln(

imi

imi

vvi

m

i

mimvv

vPdv

d

v

imvP

dv

d

1

1))(ln(

vkn

vPdv

d

12

1))(ln(

Insert the known functions:

The last summand dominates:

Reduce and replace m and i:

Page 29: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Distance between the Curves

1

0

01

1

1

lnlnlni

l

lml

i

l

lml

ii

vvl

m

vvl

m

PP

11

1

11

1

lnlnlnimi

imi

ii

vvi

m

vvi

m

PP

vi

vimPP ii 1

1lnlnln 1

vk

vknPP kk 1

1ln2

1lnln 1

The last summand dominates:

Reduce and replace m and i:

Page 30: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Effective combination growth

Nominal Growth Rate

-2 ln ( (n-1- k) / k) Ln: Logarithmic sketch minus: Number of combinations is reciprocal to Exceeding Probability

2: Exceeding Probability only known for even break numbers

k

n 1

1

1

k

n(n-1-k) / k

However, break combinations are not independentand we know the effective number of combinations

Page 31: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Ratio: nominal / effective

k1 k2 k nominal effectiv c=nom/eff

2 4 3 -2.552 -7.784 0.328

4 6 5 -2.186 -6.952 0.315

6 8 7 -1.963 -6.356 0.309

8 10 9 -1.765 -5.889 0.300

10 12 11 -1.645 -5.503 0.299

12 14 13 -1.514 -5.173 0.293

14 16 15 -1.435 -4.885 0.294

16 18 17 -1.363 -4.627 0.295

18 20 19 -1.292 -4.394 0.295

The ratio of nominal / effective is approximatly constant with c = 0.3

Page 32: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Very Rough Solution

vk

vkn

k

knc

kn

v

dk

dv

1

1ln2

11ln

1

12

1*

n

kkNormalisation

for small k*

)4ln()100ln(3.02

:

4

)1(

)1(ln

1ln2

1

1*

*

*

*

*

*

vk

vk

n

k

kc

dk

dv

v

k

15.439.176.2

for n = 100

Page 33: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

The Two Contributions

:

4

)1(

)1(ln

1ln2

1

1*

*

*

*

*

*

vk

vk

n

k

kc

dk

dv

v

ktruth

estimate

Page 34: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Exact Solution

5ln21ln

2

1

1

1*

***

k

kk

dk

dv

v

k

***

*

1

5ln21ln

2

1

1

1dk

kk

kdv

v

*

2

1

*

*

2

1)5ln(2* 1

11k

k

kkv

Page 35: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Constance of Solution

101 ye

ars21 yea

rs

The solution for the exponent is constant for different length oftime series (21 and 101 years).

Page 36: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

The extisting algorithm Prodige

Original formulation of Caussinus & Lyazrhi for the penalty term as adopted by Mestre for Prodige

Translation into terms used by us.

Normalisation by k* = k / (n -1)

Derivation to get the minimum

In Prodige it is postulated that the relative gain of external variance is a constant for given n.

minln21ln * nkv

0ln21

1*

ndk

dv

v

ndk

dv

vln2

1

1*

minln1

21ln

n

n

kv

min)ln(

1

2

)(

)(

1ln)(

1

2

1

1

2

nn

lk

YY

YYn

YCn

ii

k

j

jj

k

Page 37: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Our Results vs Prodige

We know the function for the relative gain of external variance.

Its uncertainty as given by isolines of exceeding probabilities for 2-i

are characterised by constant distances.

Caussinus and Lyazrhi (adopted by Mestre) propose just a constant of 2 ln(n) ≈ 9

Exceeding probability1/1281/641/321/161/81/4

Page 38: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Wrong Direction

n = 101 years n = 21 years

Page 39: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Conclusion

We have found a general mathematical formulation how the external variance of a random time series is increasing when more and more breaks as given by Dynamical Programming are inserted.

This is much more accurate than existing estimations and can be used in future as reference to define the optimum number of breaks.

Page 40: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Integrated result

How does the found function look like after integration?

Crosses: Test data

Line: Theory

Error bars: 90 and 95 percentile

Page 41: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Appendix (1)

vl

m

vl

m

f

i

i

1

1

vl

vlmf

i

i

1

1

vl

vlnf

k

k

1

1

vk

vknf

1

1

Consider the individual summands of the sum as defined in The factor of change f between a certain summand and its successor is:

m and i can be replaced by n and k:

inserting k instead of lk is a lower limit for f because (n-1-lk)/lk, the rate of change of the binomial coefficients, is decreasing monotonously with k:

where li runs from zero to i. The ratio of consecutive binomial coefficients can be replaced and it follows:

normalised by 1/(n-1):

vk

vkf

1

1*

*

1

0

1)(i

l

lml vvl

mxP

Page 42: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Appendix (2)

4**

4**

1

111

kk

kkf

3**

4*

1

11

kk

kf

4

31

4

31

4

31

411***

*

**

*

kkk

k

kk

kf

1

011*

4*3*

k

kkf

the approximate solution is known with 1-v = (1- k*)4

0k

1k

We can conclude that each element of the sum givenabove is by a factor f larger than the prior element.For small k* the factor f is greater than about 4 and grows to infinity for large k*. Consequently, we canapproximate the sum by its last summand according to:

111

11

1)(

imi

i

il

lml vvi

mvv

l

mxP

Page 43: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Application (1)

Insert in each of 1000 random time series 5 breaks of variance 1.

The change of external variance for low break numbers (1, 2, 3 up to about 10) increase.

Lying above the theoretical function for random time series without any break (arrow).

Variances of break numbers higher than 5 increase, because the inserted 5 breaks are not always the biggest.

Page 44: DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?

DDS – 12. December 2011

Application (2)Stop break search, when the growth rate for

the external variance drops firstly below the theoretical one for zero breaks.

1 Example of 1000 test time series

Crosses: Observations

Thin line: Inserted breaks

Fat line: Detected breaks

In average over 1000 samples:

Added variance: 86%

(theoretically 5/6)

Remaining after correction: 27%

Average detected break number 5.48