28
Luc Perreault (Hydro-Québec Research Institute) James Merleau (Hydro-Québec Research Institute) Guillaume Évin (University of NewCastle) Mixture of normal, gamma and Gumbel distributions Workshop on Statistical Methods for Meteorology and Climate Change 12-14 January 2011

Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

Luc Perreault (Hydro-Québec Research Institute)

James Merleau (Hydro-Québec Research Institute)

Guillaume Évin (University of NewCastle)

Mixture of normal, gamma and Gumbel distributions

Workshop on Statistical Methods for Meteorology and Climate Change

12-14 January 2011

Page 2: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

2 Groupe – Technologie

Outline

> Motivation

> Models, estimation and selection

> Applications• Spring floods

• Tree-rings time series

> Conclusions and future work

Page 3: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

3 Groupe – Technologie

1

0

( | , , ) ( | , )K

t k t kk

g y d w f y d−

=

=∑w θ θ

( | , )t kf y dθ

where

wk

typrobability that yt belongs to population k

observation at time t

probability density function for a given family d with vector of parameters θθθθk

K-component mixture Titterington et al. (1985), Robert (1996), Stephens (2000), Marin and Robert (2007)

Motivation

Page 4: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

4 Groupe – Technologie

> Probabilistic structure of some observed hydrometeorologicaldata can be complex

• Hydrologists use elaborate parameterised distributions (log-Pearson Type 3, Wakeby, Kappa, Halphen,…)

• Some authors have also proposed nonparametric approaches (Adamowski, 1985; O’Connell, 2005)

Model flexibility

200 250 300 350 400 4500

2

4

6

8

10

12

Fre

quen

cy

1970 1980 1990 2000150

200

250

300

350

400

450

500

Vol

umes

(hm

3 )

Spring volumes of the Aux Écorces River (m3/s)

Motivation

Page 5: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

5 Groupe – Technologie

> Mixture components can be interpreted

> Mixture models address formally the problem of non-homogeneous data

Spring peak discharge of the Moisie River (m3/s)

Heterogeneity

1500 2000 2500 3000 35000

2

4

6

8

10

Fre

quen

cy

1970 1980 1990 20001000

1500

2000

2500

3000

3500

4000

Spr

ing

flood

pea

k (m

3 /s)

Motivation

Page 6: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

6 Groupe – Technologie

Motivation

> Significant changepoints in annual inflows for numerous watersheds in Québec

> Confirmed in recent studies• Perreault et al. (2000, 2007)

• Jandhyala et al. (2009)

• Saïd et al. (2009)

• Jalbert et al. (2010)

From Jalbert et al. (2010)

No changepoints

Changepoints

Page 7: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

7 Groupe – Technologie

Motivation

> Alternatives to normal mixtures

• gamma and Gumbel

> With and without persistence

> Automatic model selection

> Bayesian framework

Objectives

0 5 10 150

0.1

0.2

0 5 100

0.2

0.4normale

-5 0 5 10 150

0.2

0.4

0 500

0.02

0.04

0 10 200

0.05

0.1gamma

0 10 200

0.5

0 5 10 150

0.2

0.4

0 10 200

0.2

0.4Gumbel

0 5 100

0.2

0.4

Page 8: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

8 Groupe – Technologie

( )Berntz w w∼

( ), ,t t k t ky z k f y d= θ θ∼

[ ]Pr 0tw z= =

Latent variables

Observations

( ),k wπ θ

Prior

θθθθzt

y1 y2 yt yn… …

…z1 z2 zt zn…

w

Models, estimation and selection0 1( | , , ) ( | , ) (1 ) ( | , )t t tg y w d wf y d w f y d= + −θ θ θ

EM or

MCMC

Introducing latent variables (i.d. case)( ) ( ) ( )1

0 1| , , | , | ,t tz z

t t t tg y z d f y d f y d−=θ θ θ

Page 9: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

9 Groupe – Technologie

Latent variables

( )1, Markovt tz z − W W∼

00 01

10 11

1 1

1 1

Pr( 0 | 0) Pr( 1| 0)

Pr( 0 | 1) Pr( 1| 1)t t t t

t t t t

w w

w w

z z z z

z z z z− −

− −

=

= = = = = = = = =

W

Models, estimation and selection

θθθθzt

y1 y2 yt yn… …

…z1 z2 zt zn…

W

Introducing latent variables (d. case)

EM or

MCMC

Page 10: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

10 Groupe – Technologie

( )1( | , , 2) exp / , 0 ,( )t t t tf y y y y

φφ

φφζ φ φ ζ

ζ φ−= − ≤ < ∞

Γ

2( | , , 1) exp ( ) ,2 2t t tf y y yφ φζ φ ζπ

= − − −∞ < < ∞

Models, estimation and selection

[ ]{ } )(exp)(exp)3 ,,|( EtEtt cycyyf −−−−−−−= ζφζφφφζ

Parameterization Nelder et Wedderburn (1972)

> normal (d = 1)

> gamma (d = 2)

> Gumbel (d = 3)

1

0

( | , , ) ( | , )K

t k t kk

g y d w f y d−

=

=∑w θ θ

Page 11: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

11 Groupe – Technologie

> Normal• Conjugate priors for all parameters

• Gibbs sampling

> Gamma• Use of the Stirling approximation : conjugate priors

for all parameters

• Gibbs sampling

> Gumbel• No conjugate prior for 1 parameter

• Metropolis-Hasting within Gibbs sampling

Models, estimation and selection

Estimation : MCMC

Page 12: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

12 Groupe – Technologie

> Which family of probability distribution shouldbe considered ?

> How many components ?• Do significant changepoints have occured in the

time series ?

• How many changepoints and switching regimesoccured ?

Models, estimation and selection

Model selection

Page 13: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

13 Groupe – Technologie

where ΛΛΛΛ = (all the parameters and latent variables)

> Evaluation of the marginal densities

( )1

01

( ) ( | , )n K

k t kkt

m w f y d dπ−

==

= ∑∏∫y θ Λ Λ

[Évin et al., 2010]

Models, estimation and selection

Model selection

• Chib (1995) when full conditional posterior distributions are directly sampled

• Chib et Jeliazkov (2001) generalize this method when full conditional posterior distributions contain an unknown constant of proportionality.

Page 14: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

14 Groupe – Technologie

Peaks Moisie River

YES

NO

NO

Dep.

-279-277-2792

-281-280-2812

-281-279-2791

Gumbelgammanormalk

Log-marginals for the Moisie Riverspring flood peaks log[m(y)]

1000 1500 2000 2500 3000 35000

1

2

3

4

5

6

7

8

9

10

Fre

quen

cy

1970 1975 1980 1985 1990 1995 2000 20051000

1500

2000

2500

3000

3500

Year

Sp

ring

flo

od

pe

ak

(m3 /s

)

Applications : Spring flood

Page 15: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

15 Groupe – Technologie

Peaks Moisie River

1000 1500 2000 2500 3000 35000

1

2

3

4

5

6

7

8

9

10

Fre

quen

cy

1970 1975 1980 1985 1990 1995 2000 20051000

1500

2000

2500

3000

3500

Year

Sp

ring

flo

od

pe

ak

(m3 /s

) Inference about the parameters for the gamma mixture with persistence

Applications : Spring flood

Page 16: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

16 Groupe – Technologie

Peaks Moisie River

1965 1970 1975 1980 1985 1990 1995 2000 20051000

2000

3000

Spr

ing

flood

pea

ks (

m3 /s

)1965 1970 1975 1980 1985 1990 1995 2000 20050

0.5

1

Pro

b(Z

= 0

| y)

1965 1970 1975 1980 1985 1990 1995 2000 20050

0.5

1

Year

Pro

b(Z

= 1

| y)

FitMixture of gammawith persistence

Classification

1000 1500 2000 2500 3000 35000

1

2

3

4

5

6

7

8

9

10

Fre

quen

cy

Applications : Spring flood

Page 17: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

17 Groupe – Technologie

Peaks Moisie River

1965 1970 1975 1980 1985 1990 1995 2000 20051000

2000

3000

Spr

ing

flood

pea

ks (

m3 /s

)1965 1970 1975 1980 1985 1990 1995 2000 20050

0.5

1

Pro

b(Z

= 0

| y)

1965 1970 1975 1980 1985 1990 1995 2000 20050

0.5

1

Year

Pro

b(Z

= 1

| y)

FitMixture of gammawith persistence

Classification

1000 1500 2000 2500 3000 35000

1

2

3

4

5

6

7

8

9

10

Fre

quen

cy

Applications : Spring flood

Page 18: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

18 Groupe – Technologie

> Have the changepoints and regimesobserved in hydrometeorological reallyoccured ?

> No more than 50 years of data !!

> Archives Project[Bégin et al., 2010]

Applications : Tree-rings

Page 19: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

19 Groupe – Technologie

Black spruce tree-ring time series [Bégin et al., 2010]

Applications : Tree-rings

Page 20: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

20 Groupe – Technologie

14012664805793155-29924

54413566825995159-26933

13113550804794159-31932

-201104-1859-260140-82841

HERRT630ROZXROZMROZIHM2HM1DA1XDA1MCANEK

Sites

Log-marginals for tree-rings time series log[m(y)](normal mixtures with persistence)

Model selection : How many components ?

Applications : Tree-rings

Page 21: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

21 Groupe – Technologie

ClassificationInference : site ROZM

1850 1870 1890 1910 1930 1950 1970 1990 20040.5

1

1.5

Sta

nd. t

ree

-rin

g w

idth

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 0

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 1

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 2

)

Year

Parameters

0.8 1 1.2 1.4 1.60

5

10

15

20

25

30

Pos

terio

r de

nsity

Mean

0 0.05 0.1 0.15 0.2 0.250

10

20

30

40

50

Pos

terio

r de

nsity

Sigma

Applications : Tree-rings

Page 22: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

22 Groupe – Technologie

ClassificationInference : site ROZM

1850 1870 1890 1910 1930 1950 1970 1990 20040.5

1

1.5

Sta

nd. t

ree

-rin

g w

idth

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 0

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 1

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 2

)

Year

Parameters

0.8 1 1.2 1.4 1.60

5

10

15

20

25

30

Pos

terio

r de

nsity

Mean

0 0.05 0.1 0.15 0.2 0.250

10

20

30

40

50

Pos

terio

r de

nsity

Sigma

Applications : Tree-rings

Page 23: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

23 Groupe – Technologie

Inference : site ROZMParameters

1840 1860 1880 1900 1920 1940 1960 1980 2000 20200.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Year

Sta

nda

rdiz

ed

tree-

ring

wid

th

Smoothing with 90% credibility interval

( )0

ˆsm PrK

t t kk

z k µ=

= =∑

0.8 1 1.2 1.4 1.60

5

10

15

20

25

30

Pos

terio

r de

nsity

Mean

0 0.05 0.1 0.15 0.2 0.250

10

20

30

40

50

Pos

terio

r de

nsity

Sigma

Applications : Tree-rings

Page 24: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

24 Groupe – Technologie

ClassificationInference : site DA1X

1850 1870 1890 1910 1930 1950 1970 1990 20040.5

0.6

0.7

0.8

0.9

Sta

nd.

tree

-rin

g w

idth

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 0

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 1

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 2

)

Year

Parameters

0.4 0.5 0.6 0.7 0.8 0.90

5

10

15

20

25

30

35

Pos

terio

r de

nsity

Mean

0 0.02 0.04 0.06 0.08 0.10

10

20

30

40

50

60

70

Pos

terio

r de

nsity

Sigma

Applications : Tree-rings

Page 25: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

25 Groupe – Technologie

ClassificationInference : site DA1X

1850 1870 1890 1910 1930 1950 1970 1990 20040.5

0.6

0.7

0.8

0.9

Sta

nd.

tree

-rin

g w

idth

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 0

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 1

)

1850 1870 1890 1910 1930 1950 1970 1990 20040

0.5

1

Pr(

z t = 2

)

Year

Parameters

0.4 0.5 0.6 0.7 0.8 0.90

5

10

15

20

25

30

35

Pos

terio

r de

nsity

Mean

0 0.02 0.04 0.06 0.08 0.10

10

20

30

40

50

60

70

Pos

terio

r de

nsity

Sigma

Applications : Tree-rings

Page 26: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

26 Groupe – Technologie

Inference : site DA1XParameters

1840 1860 1880 1900 1920 1940 1960 1980 2000 20200.4

0.5

0.6

0.7

0.8

0.9

1

Year

Sta

nda

rdiz

ed

tree-

ring

wid

th

Smoothing with 90% credibility interval

0.4 0.5 0.6 0.7 0.8 0.90

5

10

15

20

25

30

35

Pos

terio

r de

nsity

Mean

0 0.02 0.04 0.06 0.08 0.10

10

20

30

40

50

60

70

Pos

terio

r de

nsity

Sigma

( )0

ˆsm PrK

t t kk

z k µ=

= =∑

Applications : Tree-rings

Page 27: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

27 Groupe – Technologie

Conclusions

> Alternatives to normal mixtures

• better fit to environmental r.v.

• parsimony

> Automatic model selection

• normal, gamma and Gumbel mixtures

• i.d. and with Markovian dependence

• number of components

> Can be generalized to other distributions

Page 28: Mixture of normal, gamma and Gumbel distributions · CANE DA1M DA1X HM1 HM2 ROZI ROZM ROZX RT630 HER K Sites Log-marginals for tree-rings time series log [m(y)] (normal mixtures with

28 Groupe – Technologie

References

> Évin, G., Merleau, J. and Perreault, L. (2010). Mixtures of normal, gamma,and Gumbel distributions. Submitted to Water ResourcesResearch.

> Perreault, L., R. Garçon and J. Gaudet (2007). Modelling hydrologictime series using regime switching models and measures of atmospheric circulation . La Houille Blanche, 6 : 111-123.