Upload
emily-thorpe
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Case studies in Gaussian process modelling of computer codes for
carbon accounting
Marc Kennedy,Clive Anderson, Stefano Conti, Tony O’Hagan
Talk Outline
Centre for Terrestrial Carbon Dynamics
Computer Models in CTCD
Bayesian emulators
Case Study 1: SPA
Case Study 2: SDGVM
Centre for Terrestrial Carbon Dynamics
The CTCD…
is a NERC centre of excellence for Earth Observation
made up of groups from Sheffield, York, Edinburgh, UCL, Forest Research
brings together experts in vegetation modelling, soil science, earth observation, carbon flux measurement and statistics
Net Ecosystem Production
Plant respiration
Photosynthesis
Gain
Loss
Soil respiration
Loss
– Terrestrial carbon source if NEP is negative
– Terrestrial carbon sink if NEP is positive
Computer Models in CTCD
SPA– Simulates plant processes at 30-minute
time intervals ForestETP
– Stand scale– Localised modelling
SDGVM– Global scale– Coarse resolution
Statistical objectives within CTCD
Contribute to the development of these models
– through model testing using sensitivity analysis
Identify the greatest sources of uncertainty
Correctly reflect the uncertainty in predictions
– Uncertainty analysis: propagating the parameter uncertainty through the model
Bayesian Emulation of Models
Model output is an unknown function of its inputs
– Convenient prior is a Gaussian process
– Run code at set of ‘well chosen’ input points
– Obtain posterior distribution
The emulator is the posterior distribution of the output
– Fast approximation
– Measure of uncertainty
– Nice analytical form for further analysis
Case study 1: Soil Plant Atmosphere (SPA) Model
SPA is a fine scale model created by Mat Williams– Aggregated SPA outputs were used to
create the simpler up-scaled model (ACM: the Aggregated Canopy Model) by fitting a set of simple equations with 9 parameters
Can an emulator do any better than ACM as an approximation to SPA?
ACM vs. Emulator for predicting SPA
Bayesian emulator created using only 150 of the total 6561 points used to create ACM
Predicted remaining 6411 SPA points using emulator and ACM– Compare Root Mean Square Errors
(RMSE)
0 5 10 15
0
5
10
15
SP
A P
redi
ctio
ns
Emulator Predictions
RMSE = 0.314 using emulator
0 5 10 15
0
5
10
15
ACM Predictions
RMSE = 0.726 using ACM
Case Study 2: Sheffield Dynamic Global Vegetation Model
SDGVM is a point model– each pixel represents an area, with an
associated vegetation type / land use
Vegetation type is described using 14 plant functional type parameters
SDGVM is constantly being developed– To improve process modelling– To incorporate more detailed driving data
Plant Functional Type inputs
Examples: Leaf life span Leaf area Temperature when bud bursts Temperature when leaf falls Wood density Maximum carbon storage Xylem conductivity
Emulator will allow small groups of inputs to vary, others fixed at original default values
Soil inputs
Soil clay % Soil sand % Soil depth Bulk density
Emulator for SDGVM
Built an emulator for the NEP output of SDGVM– 80 runs in the 5-dimensional input space were used as
training data– A maximin Latin hypercube design was used to ensure
even coverage of the input space. Plant scientists specified the ranges
24.259
14.24
18.384
36.204
-3.214
1.774
254.0 6.304346 7.913044 20.28985 6.521775
330.0 8.739128 8.173912 13.4058 19.56525
326.0 8.30435 5.56522 7.971025 50.000023
145.0 5.521742 5.043478 0.72465 33.695625
236.0 9.43478 8.782606 1.08695 75.0
123.0 9.608696 9.478258 21.0145 71.739151
Run code
… …
Model testing: Sensitivity analysis
We use sensitivity analysis for model checking and for model interpretation
Calculate main effects of each code input– How does output change if we vary the
input, averaged over other inputs?
Building the emulator has uncovered bugs– simply by trying different combinations of
input values
Main Effect: Leaf life span
100 150 200 250 300 350
leaf life-span
01
02
03
0
me
an
NE
P
Main Effect: Leaf life span (updated)
100 150 200 250 300 350
leaf life-span
05
10
15
20
25
30
Me
an
NE
P
Main Effect: Senescence Temperature
4 5 6 7 8 9 10
senescence
01
02
03
0
me
an
NE
P
Main Effects: Soil inputs
Soil inputs had been fixed in SDGVM
Output sensitive to sand content, but not clay content, over these ranges
More detailed soil input data are now used
0 5 10 15 20 25
soil clay%
01
02
03
0
mea
n N
EP
0 20 40 60
soil sand%
01
02
03
0
mea
n N
EP
Error discovered in the soil module
NEP
-20
0
20
40
60
80
0 500000 1000000 1500000
NEP
-20
0
20
40
60
80
0 500000 1000000 1500000
Before… After…
Bulk density Bulk density
SDGVM: new sensitivity analysis
We initially analysed uncertainty in the NEP output at a single test site, using rough ranges for the 14 plant functional type parameters
Assumed default (uniform) probability distributions for the parameters
The aim here is to identify the greatest potential sources of uncertainty
160 170 180 190 200
max. age (years)
150
160
170
180
190
1.8 2.0 2.2 2.4 2.6
water potential (M Pa)
150
160
170
180
190
160 180 200
leaf life span (days)
150
160
170
180
190
0.0035 0.0040 0.0045
minimum growth rate (m)
150
160
170
180
190
NE
P (
g/m
2 /y)
NE
P (
g/m
2 /y)
Leaf life span 69.1%
Minimum growth rate 14.2%
Water potential 3.4%
Maximum age 1.0%
Plant Functional Type parameters
Uncertainty is driven by just a few key parameters– Maximum age– Leaf life span– Water potential– Minimum growth rate
The next step was to refine the rough probability distributions for these parameters
Elicitation
We elicited formal probability distributions for the key parameters
– based on discussion with Ian Woodward
– representing his uncertainty about their values within the UK
– noting that each really applies as an average over the species actually present in a given pixel
Leaf life span (days) Minimum growth rate (m)
Maximum age (years) Water potential (M Pa)
Leaf life span 13.2%
Maximum age 1%
Water potential 3.3%
Seeding density 10%
Minimum growth rate 64%Leaf life span 69.1%
Minimum growth rate 14.2%
Water potential 3.4%
Maximum age 1.0%
Mean NEP = 174 gCm-2
Std deviation = 14.32 gCm-2
Mean NEP = 163 gCm-2
Std deviation = 12.65 gCm-2
Uniform probability distributions Refined probability distributions
Uncertainty analysis at sample sites
We computed uncertainty analyses on NEP outputs from SDGVM for 9 sites/pixels
NEP
Stockten on the Forest (Nr York)
Milton Keynes
Barnstaple (Devon)
Keswick (Lake District)Lowland (Scotland)
Dartmoor
New Forest (Hampshire)
Kielder
S. Ballater (Scotland)
20 70 120 170 220 270
Uncertainty is clearly substantial, even when we only take account of uncertainty in these parameters
The most important parameter is minimum growth rate, which accounts for typically at least 60% of overall NEP uncertainty– This suggests targeting this parameter for
research Seeding density?
Ongoing work
We need to estimate uncertainty in the overall UK carbon budget
– Developing new theory for aggregating uncertainty over many pixels
Windows software will be made available later this year
www.shef.ac.uk/st1mck