33
SAMSI Distinguished, October 200 6 Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Embed Size (px)

Citation preview

Page 1: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

SAMSI Distinguished, October 2006 Slide 1

Tony O’Hagan, University of Sheffield

Managing Uncertainty in Complex Models

Page 2: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 2http://mucm.group.shef.ac.uk

Outline

Uncertainty in models Quantifying uncertainty

Example: dynamic global vegetation model Reducing uncertainty

Example: atmospheric deposition model Methods Research directions

Dynamic models MUCM

Page 3: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 3http://mucm.group.shef.ac.uk

Computer models

In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real-world processes For understanding, prediction, control

There is a growing realisation of the importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty,

it’s easy to dismiss them

Page 4: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 4http://mucm.group.shef.ac.uk

Examples

Climate prediction

Molecular dynamics

Nuclear waste disposal

Oil fields Engineering

design Hydrology

Page 5: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 5http://mucm.group.shef.ac.uk

Sources of uncertainty

A computer model takes inputs x and produces outputs y = f(x)

How might y differ from the true real-world value z that the model is supposed to predict? Error in inputs x

Initial values, forcing inputs, model parameters Error in model structure or solution

Wrong, inaccurate or incomplete science Bugs, solution errors

Page 6: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 6http://mucm.group.shef.ac.uk

Quantifying uncertainty

The ideal is to provide a probability distribution p(z) for the true real-world value The centre of the distribution is a best estimate Its spread shows how

much uncertainty about z is induced by uncertainties on the last slide

How do we get this? Input uncertainty: characterise p(x), propagate

through to p(y) Structural uncertainty: characterise p(z-y)

Page 7: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 7http://mucm.group.shef.ac.uk

Example: UK carbon flux in 2000

Vegetation model predicts carbon exchange from each of 700 pixels over England & Wales Principal output is Net Biosphere Production

Accounting for uncertainty in inputs Soil properties Properties of different types of vegetation

Aggregated to England & Wales total Allowing for correlations Estimate 7.61 Mt C Std deviation 0.61 Mt C

Page 8: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 8http://mucm.group.shef.ac.uk

Maps

Page 9: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 9http://mucm.group.shef.ac.uk

Sensitivity analysis

Map shows proportion of overall uncertainty in each pixel that is due to uncertainty in the vegetation parameters As opposed to soil

parameters Contribution of

vegetation uncertainty is largest in grasslands/moorlands

Page 10: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 10http://mucm.group.shef.ac.uk

England & Wales aggregate

PFTPlug-in estimate

(Mt C)Mean(Mt C)

Variance (Mt C2)

Grass 5.28 4.65 0.323

Crop 0.85 0.50 0.038

Deciduous 2.13 1.69 0.009

Evergreen 0.80 0.78 0.001

Covariances 0.001

Total 9.06 7.61 0.372

Page 11: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 11http://mucm.group.shef.ac.uk

Reducing uncertainty

To reduce uncertainty, get more information! Informal – more/better science

Tighten p(x) through improved understanding Tighten p(z-y) through improved modelling or

programming Formal – using real-world data

Calibration – learn about model parameters Data assimilation – learn about the state variables Learn about structural error z-y Validation

Page 12: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 12http://mucm.group.shef.ac.uk

Example: Nuclear accident

Radiation was released after an accident at the Tomsk-7 chemical plant in 1993

Data comprise measurements of the deposition of ruthenium 106 at 695 locations obtained by aerial survey after the release

The computer code is a simple Gaussian plume model for atmospheric dispersion

Two calibration parameters Total release of 106Ru (source term) Deposition velocity

Page 13: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 13http://mucm.group.shef.ac.uk

Data

Page 14: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 14http://mucm.group.shef.ac.uk

A small sample (N=10 to 25) of the 695 data points was used to calibrate the model

Then the remaining observations were predicted and RMS prediction error computed

On a log scale, error of 0.7 corresponds to a factor of 2

Calibration

Sample size N 10 15 20 25

Best fit calibration 0.82 0.79 0.76 0.66

Bayesian calibration 0.49 0.41 0.37 0.38

Page 15: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 15http://mucm.group.shef.ac.uk

So far, so good, but

In principle, all this is straightforward In practice, there are many technical difficulties

Formulating uncertainty on inputs Elicitation of expert judgements

Propagating input uncertainty Modelling structural error Anything involving observational data!

The last two are intricately linked And computation

Page 16: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 16http://mucm.group.shef.ac.uk

The problem of big models

Tasks like uncertainty propagation and calibration require us to run the model many times

Uncertainty propagation Implicitly, we need to run f(x) at all possible x Monte Carlo works by taking a sample of x from p(x) Typically needs thousands of model runs

Calibration Traditionally this is done by searching the x space for

good fits to the data This is impractical if the model takes more than a few

seconds to run We need a more efficient technique

Page 17: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 17http://mucm.group.shef.ac.uk

Gaussian process representation

More efficient approach First work in early 1980s (DACE)

Consider the code as an unknown function f(.) becomes a random process We represent it as a Gaussian process (GP)

Training runs Run model for sample of x values Condition GP on observed data Typically requires many fewer runs than MC

And x values don’t need to be chosen randomly

Page 18: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 18http://mucm.group.shef.ac.uk

Emulation

Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters

The posterior distribution is known as an emulator of the computer code Posterior mean estimates what the code would

produce for any untried x (prediction) With uncertainty about that prediction given by

posterior variance Correctly reproduces training data

Page 19: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 19http://mucm.group.shef.ac.uk

2 code runs

Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

Page 20: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 20http://mucm.group.shef.ac.uk

3 code runs

Adding another point changes estimate and reduces uncertainty

Page 21: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 21http://mucm.group.shef.ac.uk

5 code runs

And so on

Page 22: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 22http://mucm.group.shef.ac.uk

Then what?

Given enough training data points we can emulate any model accurately So that posterior variance is small “everywhere” Typically, this can be done with orders of

magnitude fewer model runs than traditional methods

Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration

Conceptually very straightforward in the Bayesian framework But of course can be computationally hard

Page 23: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 23http://mucm.group.shef.ac.uk

DACE and BACCO

This has led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models

All based on building the GP emulator of the model from a set of training runs

Early work concentrated on prediction – DACE Design and Analysis of Computer Experiments

More recently, wider scope and emphasis on the Bayesian framework – BACCO Bayesian Analysis of Computer Code Output

Page 24: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 24http://mucm.group.shef.ac.uk

BACCO includes

Uncertainty analysis Sensitivity analysis Calibration Data assimilation Model validation Optimisation Etc… All within a single coherent framework

Page 25: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 25http://mucm.group.shef.ac.uk

Research directions

Models with heterogeneous local behaviour Regions of input space with rapid response, jumps

High dimensional models Many inputs, outputs, data points

Dynamic models Data assimilation

Stochastic models Relationship between models and reality

Model/emulator validation Multiple models

Design of experiments Sequential design

Page 26: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 26http://mucm.group.shef.ac.uk

Dynamic models

Initial state x0 is updated recursively by the one-step model f1(x,c)

Forcing inputs ct

Interested in sequence x1, x2, … xT At least 4 approaches to emulating this

XTX0

c1

f1

X1

c2

f1

X2

c3

f1

XT-2

cT-1

f1

XT-1

cT

f1

Page 27: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 27http://mucm.group.shef.ac.uk

1. Treat time as input

Emulate xt as the function

f(x0,t) = ft(x0,ct) = f1(xt-1,ct)

= f1(… f1(f1(x0,c1),c2)…,ct)

Easy to do Hard to get the temporal correlation structure

right

XTX0

c1

f1

X1

c2

f1

X2

c3

f1

XT-2

cT-1

f1

XT-1

cT

f1

Page 28: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 28http://mucm.group.shef.ac.uk

2. Multivariate emulation

Emulate the vector x = (x1, x2, … xT) as the multi-output function

fT(x0) = (f1(x0,c1), f2(x0,c2), …, fT(x0,cT)) Simple extension of univariate theory Restrictive covariance structure

XTX0

c1

f1

X1

c2

f1

X2

c3

f1

XT-2

cT-1

f1

XT-1

cT

f1

Page 29: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 29http://mucm.group.shef.ac.uk

3. Functional output emulation

Treat x series as a functional output Identify features (e.g. principal components) to

summarise function Same theory as for multivariate emulation, but

lower dimension Loss of information, no longer reproduces

training data

XTX0

c1

f1

X1

c2

f1

X2

c3

f1

XT-2

cT-1

f1

XT-1

cT

f1

Page 30: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 30http://mucm.group.shef.ac.uk

4. Recursive emulation

Emulate the single step function

f1(x,c) Iterate the emulator numerically

May take longer than original model! Or approximate filtering algorithm

May be inaccurate for large T

XTX0

c1

f1

X1

c2

f1

X2

c3

f1

XT-2

cT-1

f1

XT-1

cT

f1

Page 31: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 31http://mucm.group.shef.ac.uk

Some comparisons

Multivariate emulation (approach 2) is better than treating time as an input (approach 1) Validates better Able to work with partial series

Recursive emulation still under development Looking promising

Only recursive emulation can realistically treat uncertainty in forcing inputs

Only recursive emulation can extend T beyond training data

Page 32: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 32http://mucm.group.shef.ac.uk

MUCM Managing Uncertainty in Complex Models

Large 4-year research grant 7 postdoctoral research assistants 4 PhD studentships

Started in June 2006 Based in Sheffield and 4 other UK universities

Objective: To develop BACCO methods into a robust

technology … toolkit

that is widely applicable across the spectrum of modelling applications

case studies

Page 33: SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

Slide 33http://mucm.group.shef.ac.uk

Thank you