63
Extreme values Extreme values Adam Butler Adam Butler Biomathematics & Statistics Biomathematics & Statistics Scotland Scotland Seminar at MLURI, January 2008 Seminar at MLURI, January 2008

Extreme values

  • Upload
    faraji

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Extreme values. Seminar at MLURI, January 2008. Adam Butler Biomathematics & Statistics Scotland. 1. Motivation What is EVT? Applications Current research. Motivation. Flooding, Budapest, 2002 Graham Berry http://en.wikipedia.org/wiki/Image:Floods_in_Budapest_2002.jpg. - PowerPoint PPT Presentation

Citation preview

Page 1: Extreme values

Extreme valuesExtreme values

Adam ButlerAdam Butler Biomathematics & Statistics ScotlandBiomathematics & Statistics Scotland

Seminar at MLURI, January 2008Seminar at MLURI, January 2008

Page 2: Extreme values

MotivationMotivation

1. Motivation1. MotivationWhat is EVT?ApplicationsCurrent research

Page 3: Extreme values

Flooding, Budapest, 2002

Graham Berry

http://en.wikipedia.org/wiki/Image:Floods_in_Budapest_2002.jpg

Page 4: Extreme values

What is the probability that the flood defenses of Budapest

will be overtopped during 2008?

Page 5: Extreme values

Northern Rock branch, London, 2007

Alex Gunningham

http://en.wikipedia.org/wiki/Image:1378965141_7817eb7212_o.jpg

Page 6: Extreme values

What is the probability of today’s value of the Dow Jones

index being at least 9.5% lower than yesterday’s?

Page 7: Extreme values

Log daily return = log(value today / value yesterday)

Value drops by 9.5% LDR drops by log(0.905) = -0.10

Q. On this particular day, what is the chance of getting a log daily return of less than –0.10?

Page 8: Extreme values

Dow Jones Data for the period 1996-2000

Page 9: Extreme values

To answer this question we clearly need to extrapolate, since –0.1 is well outside the range of the data…

Extrapolation should be avoided whenever possible, but in many real-life problems it is unavoidable

Page 10: Extreme values

So how should we go about estimating this probability?

We could assume that the data are normally distributed…

Page 11: Extreme values

P(X < –0.1) 10-20

Page 12: Extreme values

…but the extreme values that have been observed don’t

play much of a role when we estimate the parameters

(e.g. the mean and variance)

Hence, our chosen model (e.g. the normal distribution)

might do badly in describing their properties…

Page 13: Extreme values

Empirical: P(X < –0.05) 0.002

Normal: P(X < –0.05) 0.000001

Page 14: Extreme values

…and, worse still, extrapolations beyond the range of the

data often differ radically between models that provide

a very similar fit to the bulk of the data

For example, we might decide to fit a Cauchy rather than a

normal distribution…

Page 15: Extreme values

Cauchy: P(X < –0.1) 0.02

Normal: P(X < –0.1) 10-20

Page 16: Extreme values
Page 17: Extreme values

We need an alternative statistical approach that is more

robust, in the sense that it is does not require us to

make strong and untestable assumptions about

the process that is generating our data

This is the motivation for EVT – Extreme Value Theory

Page 18: Extreme values

MotivationMotivation

Motivation2. What is EVT?2. What is EVT?

ApplicationsCurrent research

Page 19: Extreme values

General characteristics of an “EVT” problem

• We are interested in a process that can be quantified, and for which we have some data

• …and we want to use this data to say something about the probability that a rare or extreme event will occur

• We will usually be interested in events that are beyond the range of the data e.g. we want to extrapolate

Page 20: Extreme values

To deal with such problems, we begin from the principle

that our inferences should only be based on the most

extreme data that we have actually observed

e.g. we should throw away almost all of the data

Page 21: Extreme values

Extreme value theory (EVT) then provides us with

some simple and robust models that can then be used to

describe the properties of these extreme data

Page 22: Extreme values

Q. What is the probability of getting more than 100mm of rain on any given day?

Page 23: Extreme values

We might decide to only use data for days with 25mm or more of rainfall…

Page 24: Extreme values

Histogram of data above a threshold of 25mm

Page 25: Extreme values

Threshold exceedance = Value - Threshold

Page 26: Extreme values

The GPD model

• A good statistical model for threshold exceedances is the GPD (Generalised Pareto Distribution)

• The probability density function is of the form

f(x) = 1 – (1 + x / )-1/

• There are two parameters, a scale parameter and a shape parameter , which needed to be estimated

Page 27: Extreme values

Threshold = u = 25mm

and estimated by maximum likelihood to be 7.70 and 0.108

P(X > 100) estimated to be 0.0000209 (once per 131 years)

GPD model fitted to threshold exceedances

Page 28: Extreme values

• But why is the GPD a good model to use? The mathematical justification is given by asymptotic theory

• The theory says that, for almost any random variable X, the exceedances of a high threshold u will tend towards following the GPD model as u tends towards infinity

• In practice, we use a threshold that is high but still finite: we rely on the fact that if this level is sufficiently high then the asymptotic result will still be approximately true

Page 29: Extreme values

When choosing a threshold, we need to balance• Precision: If the threshold is low then our results will

tend to be more certain than if it is high• Bias: extreme value methods will only be valid when

the threshold is sufficiently high

We can do this in a partly subjective way using parameter stability plots

Page 30: Extreme values

Parameter stability plot for shape parameter,

Page 31: Extreme values

The GEV model

• Another approach involves analysing block maxima

• For example, if we have hourly sea level data then we may choose to analyse only the largest value that occurs each year: the annual maximum value

• The same method can also be used to analyse minima

Page 32: Extreme values
Page 33: Extreme values

• A good statistical model for block maxima is the GEV (Generalised Extreme Value Distribution)

• The probability density function is of the form

f(x) = exp{-[1 + ((x - ) / )]-1/}

• There are three parameters - a location parameter , a scale parameter , and a shape parameter - which need to be estimated

Page 34: Extreme values
Page 35: Extreme values

The r-largest model

• The GEV model uses only one value per block

• An extension of this model involves using the r largest values per block, where r is greater than one

• e.g. We might model the 20 highest sea levels per year

Page 36: Extreme values
Page 37: Extreme values

The shape parameter

• All of the extreme value models contain a common parameter that determines the shape of the distribution

• The extremes of a light tailed distribution will have a negative shape parameter ( < 0) & the extremes of a heavy tailed distribution have a positive shape ( > 0)

• The extreme values of a normal distribution have = 0

Page 38: Extreme values

GPD: impact of the shape parameter,

= 0

= 1

= -0.5

Page 39: Extreme values

Covariates

• The properties of extreme values may depend on time, location, or other covariates (explanatory variables)

• We can easily build these covariates into our extreme value models, in a similar way that we would build them into a regression model or GLM

• The key difference is that in a GLM we only build covariates into the mean, whereas in EV models we might build them into any of the three parameters

Page 40: Extreme values

Venice sea level data – linear trend in location parameter

Page 41: Extreme values

More advanced statistical modelling

• Methods to deal with clustering: e.g. declustering algorithms, estimation of the extremal index

• Semiparametric modelling: allow trends to vary smoothly over time, using local likelihood or smoothing splines

• Bayesian methods: allow for the incorporation of prior information, and for the construction of relatively complicated hierarchical models

Page 42: Extreme values

Example of semiparametric modelling: estimated trends in storm surge levels at Dover

Page 43: Extreme values

Software

• Add-on packages are available for R (extRemes, ismev, evir, evd, evdbayes), Splus (EVIS, S+FinMetrics) and Matlab (EVIM, EXTREMES)

• The extremes toolkit provides a user-friendly interface -

www.isse.ucar.edu/extremevalues/evtk.html• Some methods are also available in Genstat• Stand-alone commercial software: Xtremes, HYFRAN

Page 44: Extreme values

Advantages• Robust

Relies on weak assumptions

Avoids bias

• Theoretically soundJustified by asymptotic theory

• Quick & relatively easy to use

• Honest …about the uncertainties involved in making statements about very rare events

Disadvantages

• InefficientMost of the data are thrown away

…we may over-estimate uncertainty

…relies on having a large sample size

• Asymptotics The theory only holds exactly for

infinitely extreme events

Difficult to extend to multivariate case

• Data qualitySensitive to errors in extreme data

Should I be using EVT?

Page 45: Extreme values

MotivationMotivation

MotivationWhat is EVT?

3. Applications3. ApplicationsCurrent research

Page 46: Extreme values

Environmental sciences

• EVT is widely used by scientists working in hydrology, climatology, oceanography and fire science

• It is also used for operational purposes in flood risk assessment and civil engineering

• Particular interest in studying the impact of climate change upon extreme events – e.g.

MICE project (www.cru.uea.ac.uk/projects/mice)

WASA project: Waves & Storms in the NE Atlantic

• .

Page 47: Extreme values

Thames Barrier, London

Source: Roger Haworth

http://en.wikipedia.org/wiki/Image:Thames_Barrier_059184.jpg

Page 48: Extreme values

Risk assessment and design

• Extreme value problems in hydrology and coastal engineering are often phrased in terms of return levels

• N-year return level: the level that is exceeded with probability 1/N in a particular year – definition applies to nonstationary processes too, but interpretation is harder

• e.g. Thames Barrier: “…was originally designed to protect London against a flood level with a return period of 1000 years in the year 2030…” (Wikipedia)

Page 49: Extreme values

Biology

• Biologists are also often interested in studying the properties of extreme or rare events, but rarely use EVT

• Some likely reasons –• Relatively small sample sizes (compared to e.g. hydrology)• Extreme events not so easily defined in quantitative terms

• New applications are likely to arise from the increasing use of large datasets (e.g. in genetics), and from an increased focus on quantitative risk assessment

Page 50: Extreme values

Genetics

A major application of EVT is in sequence alignment, and extreme value models are used by BLAST and FASTA

Compare a sequence against a vast database of known sequences -

1. define a similarity score

2. search for the best match within the database

3. use EVT to evaluate the significance of this match

“…a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences…” (Wikipedia)

Page 51: Extreme values

Ecology

Review papers by Gaines & Denny (1993) and Katz et al.

(2005) focus on disturbance – studying the extremes of

environmental processes that are known to lead to

ecological disturbance

e.g. sediment rates, fire sizes, frost days

They also consider longevity & survival – i.e. studying the

maximum lifespan or size of an individual

Page 52: Extreme values

Bumblebee on Echinacea purpurea

http://en.wikipedia.org/wiki/Image:Zigouillonbourdon.jpg

/

Page 53: Extreme values

Possible new applications in ecology

• Dispersal & spread: spatial spread (of diseases, pollen,

invasive species) known to be influenced by long-range

dispersal; can EVT be used to analyse dispersal data?

• Population dynamics: estimating the probability of

extinction or explosion of a population

• Ecological modelling: study the properties of extreme

events simulated by complex process-based ecological

models – e.g. mass extinction events

Page 54: Extreme values

Other areas where EVT is used

• Finance and insurance: in particular, calculation of Value at Risk (www.gloriamundi.org)

• Telecommunications: e.g. estimation of very large file sizes in internet traffic

• Sport science: trends in record times for athletics

…and many, many more…

Page 55: Extreme values

MotivationMotivation

MotivationWhat is EVT?Applications

4. Current research4. Current research

Page 56: Extreme values

Extreme value theory remains an area of active methodological research, with two key strands:

1) Improving the practical utility of existing extreme value methods by making use of recent developments in statistics and computing e.g. Bayesian extremes

2) Developing methods for multivariate extremes – this involves much theoretical work

Page 57: Extreme values

Multivariate extremes

• Standard (“univariate”) extreme value methods concentrate on the extremes of a single random variable

• Multivariate extreme value theory studies how the values of different variable are related at extreme levels

• The different random variables may relate to genuinely different processes (e.g. tide and waves) or to the same process at different locations (spatial extremes)

Page 58: Extreme values

Some applications:

1) Calculating the risk that there will be in a fall in the overall value of a portfolio of investments

2) Assessing regional flood risk e.g. estimating the probability that a severe flood will occur at one or more locations within a region

3) Evaluating the probability that two atmospheric pollutants will simultaneously reach hazardous levels

Page 59: Extreme values
Page 60: Extreme values

Two random variables X1 and X2 may either be…

• Asymptotically dependent:

Extreme values of X2 occur when X1 is also extreme

• Asymptotically independent:

Extreme values of X2 occur when X1 is not extreme, and

vice-versa

Page 61: Extreme values
Page 62: Extreme values

• There is a rich mathematical theory about asymptotic dependence, providing us with statistical models that we can use if we are prepared to make this assumption

• There is little theory about asymptotic independence, and practical techniques for dealing with data that exhibit this have only been developed in the past 5-10 years

• There are relatively few practical techniques for testing whether your data exhibit asymptotic (in)dependence

Page 63: Extreme values

Thank you for listening!Thank you for listening!

Email: [email protected]: [email protected]: 0131 650 4896Phone: 0131 650 4896