1
Atmospheric Downscaling using Genetic Programming Tanja Zerenner 1 , Victor Venema 1 , Petra Friederichs 1 , Clemens Simmer 1 1 Meteorological Institute, University of Bonn, Germany 1. Motivation Figure 1: Scale differences in TerrSysMP. The Transregional Collaborative Re- search Centre 32 (TR 32) has devel- oped an integrated modeling system, TerrSysMP, consisting of the atmo- spheric model COSMO, the land-surface model CLM, and the hydrological model ParFlow. These component models are usually operated at different resolu- tions in space and time. Thus up- and downscaling procedures are required at the interfaces between atmospheric and land-surface/subsurface models. 2. Method We develop a mixed physical/statistical downscaling scheme from a training data set of high-resolution models runs via multiobjective symbolic regres- sion using Genetic Programming (GP). Discretization etc. induces uncertainty in models. Hence we do not try to repro- duce the ’exact’ high-resolution model output fields, but ’realistic’ ones. Symbolic Regression Given a sample data set {X , Y } the aim is to find a function that maps X to Y . In symbolic regression the form of the regression function (linear, polynomial,...) is not known. Genetic Programming GP originates from machine learning: From a set of functions (arithmetic ex- pressions, IF-statements, etc.) and terminals (constants or variables) GP gen- erates potential solutions to a given problem while minimizing a fitness (cost, error) function. Figure 2: A pareto front for a maximization prob- lem with two objectives. Pareto Optimality When dealing with multiple objec- tives often there is no solution which is optimal in the absolute sense (i.e. in every objective). An n-tuple x = {x 1 , x 2 , ..., x n } is called pareto optimum of a set A of n-tuples, if there is no n-tuple y = {y 1 , y 2 , ..., y n } in set A with for all i = 1, 2, ...n; y i x i and for minimum one i y 1 > x 1 (for a maximization problem!). Implementation Our code is based on the GPLAB package for Matlab (Silva et al., 2003). For multiobjective fitness assignement we have integrated the Strength Pareto Apporach (SPEA) by Zitzler and Thiele (1999). 3. Temperature Downscaling We illustrate our method using the problem of downscal- ing near-surface temperature during clear-sky nights. The downscaling scheme currently implemented in the TerrSysMP (Schomburg et al., 2010) does not contain a regression step for this weather situation. 3.1 Set Up Predictors High-res. surface information Coarse weather information topography & 6 derived params. near-surface temperature plant cover near-surf. vert. temp. gradient roughness length near-surf. turbulent kinetic energy near-surf. horizontal windspeed cloud cover at 3 heights Training Data COSMO model output at 400m resolution for 27 timesteps and a domain size of 280 × 280 grid points, i.e. 40 × 40 grid points at the coarse (2.8km) scale. Objectives Root Mean Square Error: RMSE Mean Error of Standard Deviation: ME(STD) ME (STD) = MEAN STD 7×7 (T d ) - STD 7×7 (T t ) with T d denoting the downscaled temperature and T t denoting the ’true’ tremperature. STD 7×7 denotes the (fine-scale) standard de- viation within the coarse 7 × 7 pixels. Earth Movers Distance: EMD for histograms of temperature values (barwidth= 0.25K ) of full fields at single timesteps. The EMD is a measure for the ’distance’ between two histogram distributions. Figure 3: Sketch of concept of histogram differences. EMD 0 = 0 EMD i+1 = (A i + EMD i ) - B i EMD = i |EMD i | As objective we take the mean EMD over all training data fields, i.e. at each timestep. GP settings Parameter Value function set +,-,*,protected /, if generations 200 population size 100 max. pareto set size 50 genetic operators mutation, crossover 3.2 Results (a) coarse (b) interpolated (c) interpolated + downscaled 278 280 282 284 286 288 290 292 temperature [K] (d) high-resolution (’true’) Figure 4: Example for downscaling a near-surface temperature field using one GP solution. Shown is a nightly temperature field of 112 × 112km within North Rhine-Westphalia in Germany [50.56 -51.03 lat , 6.06 -6.83 lon]. -4 -2 0 2 4 -4 -2 0 2 4 temp. anomaly (true) [K] temp. anomaly (predicted) [K] 0 1 2 3 0 1 2 3 STD of temperature (true) [K] STD of temperature (predicted) [K] Figure 5: Scatterplots for GP solution from Fig. 4: GP output vs. reference (’true’) values. Shown are 2500 ran- domly chosen points. Figure 6: Cross section for GP solution from Fig. 4. 0.7 0.8 0.20 0.25 0.30 0.35 0.40 RMSE [K] ME(STD) [K] 0.7 0.8 0.2 0.4 0.6 0.8 1.0 1.2 1.4 RMSE [K] EMD 0.20 0.25 0.30 0.35 0.40 0.2 0.4 0.6 0.8 1.0 1.2 1.4 ME(STD) [K] EMD Figure 7: Values of the objectives for the 50 solutions of the final pareto set. Table 1: Values of the objectives for GP solution from Fig. 4. interp. downsc. RMSE [K] 0.70 0.84 ME(STD) [K] 0.58 0.22 EMD 1.75 0.34 4. Conclusion Our preliminary results show that realistic fine-scale structures can be retrieved from the coarse scale input, which constitutes a major advancement compared to the usually applied interpolations methods. 5. Outlook Expansion of training and validation data sets. Find and test further objectives (for better quantification of spatio-temporal characteristics of fine-scale fields). Downscale remaining atmospheric variables required. Implement the downscaling in the TerrSysMP. Downscaling ensemble. Acknowledgements We gratefully acknowledge the financial support from Transregio 32 ’Patterns in Soil-Vegetation-Atmophere Systems’ funded by the ’Deutsche Forschungsgemeinschaft’ (DFG). Furthermore we would like to thank Annika Schomburg for providing training data and COSMO model support. References Schomburg, Annika, et al. "A downscaling scheme for atmospheric variables to drive soil-vegetation-atmosphere transfer models." Tellus B 62.4 (2010): 242-258. Silva, Sara, and Jonas Almeida. "GPLAB-a genetic programming toolbox for MATLAB." Proceedings of the Nordic MATLAB conference. 2003. Zitzler, Eckart, and Lothar Thiele. "Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach." Evolutionary Computation, IEEE Transactions on 3.4 (1999): 257-271. Koza, John R. Genetic Programming: vol. 1, On the programming of computers by means of natural selection. Vol. 1. MIT press, 1992. contact: [email protected]

Atmospheric Downscaling using Genetic Programming · Atmospheric Downscaling using Genetic Programming ... Earth Movers Distance: ... A comparative case study and the strength pareto

  • Upload
    lamliem

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Atmospheric Downscaling using Genetic Programming · Atmospheric Downscaling using Genetic Programming ... Earth Movers Distance: ... A comparative case study and the strength pareto

Atmospheric Downscaling using Genetic Programming

Tanja Zerenner1, Victor Venema1, Petra Friederichs1, Clemens Simmer1

1Meteorological Institute, University of Bonn, Germany

1. Motivation

Figure 1: Scale differences in TerrSysMP.

The Transregional Collaborative Re-search Centre 32 (TR 32) has devel-oped an integrated modeling system,TerrSysMP, consisting of the atmo-spheric model COSMO, the land-surfacemodel CLM, and the hydrological modelParFlow. These component modelsare usually operated at different resolu-tions in space and time. Thus up- anddownscaling procedures are required atthe interfaces between atmospheric andland-surface/subsurface models.

2. MethodWe develop a mixed physical/statistical downscaling scheme from a training

data set of high-resolution models runs via multiobjective symbolic regres-sion using Genetic Programming (GP).

Discretization etc. induces uncertainty in models. Hence we do not try to repro-duce the ’exact’ high-resolution model output fields, but ’realistic’ ones.

Symbolic RegressionGiven a sample data set {X,Y} the aim is to find a function that maps X to Y. Insymbolic regression the form of the regression function (linear, polynomial,...)is not known.

Genetic ProgrammingGP originates from machine learning: From a set of functions (arithmetic ex-

pressions, IF-statements, etc.) and terminals (constants or variables) GP gen-erates potential solutions to a given problem while minimizing a fitness (cost,error) function.

Figure 2: A pareto front for a maximization prob-lem with two objectives.

Pareto OptimalityWhen dealing with multiple objec-

tives often there is no solution whichis optimal in the absolute sense (i.e.in every objective).

An n-tuple x = {x1, x2, ..., xn} is called pareto

optimum of a set A of n-tuples, if there is no

n-tuple y = {y1, y2, ..., yn} in set A with for all

i = 1, 2, ...n; yi ≥ xi and for minimum one i

y1 > x1 (for a maximization problem!).

ImplementationOur code is based on the GPLAB package for Matlab (Silva et al., 2003). For multiobjective fitness

assignement we have integrated the Strength Pareto Apporach (SPEA) by Zitzler and Thiele (1999).

3. Temperature DownscalingWe illustrate our method using the problem of downscal-ing near-surface temperature during clear-sky nights.The downscaling scheme currently implemented in theTerrSysMP (Schomburg et al., 2010) does not contain aregression step for this weather situation.

3.1 Set Up

PredictorsHigh-res. surface information Coarse weather informationtopography & 6 derived params. near-surface temperature

plant cover near-surf. vert. temp. gradientroughness length near-surf. turbulent kinetic energy

near-surf. horizontal windspeedcloud cover at 3 heights

Training DataCOSMO model output at 400m resolution for 27timesteps and a domain size of 280 × 280 grid points,i.e. 40 × 40 grid points at the coarse (2.8km) scale.

ObjectivesRoot Mean Square Error: RMSE

Mean Error of Standard Deviation: ME(STD)ME(STD) = MEAN

(∣∣∣STD7×7(Td) − STD7×7(Tt)∣∣∣)

with Td denoting the downscaled temperature and Tt denoting the

’true’ tremperature. STD7×7 denotes the (fine-scale) standard de-

viation within the coarse 7 × 7 pixels.

Earth Movers Distance: EMDfor histograms of temperature values (barwidth= 0.25K) of full

fields at single timesteps. The EMD is a measure for the ’distance’

between two histogram distributions.

Figure 3: Sketch of concept of histogram differences.

EMD0 = 0EMDi+1 = (Ai + EMDi) − Bi

EMD =∑

i |EMDi|

As objective we take the mean EMD over all training data fields,

i.e. at each timestep.

GP settingsParameter Valuefunction set +,-,*,protected /, ifgenerations 200population size 100max. pareto set size 50genetic operators mutation, crossover

3.2 Results

(a) coarse (b) interpolated (c) interpolated + downscaled

278

280

282

284

286

288

290

292

tem

pera

ture

[K]

(d) high-resolution (’true’)

Figure 4: Example for downscaling a near-surface temperature field using one GP solution. Shown is a nightly temperature field of112 × 112km within North Rhine-Westphalia in Germany [50.56◦-51.03◦lat, 6.06◦-6.83◦lon].

−4 −2 0 2 4−4

−2

0

2

4

temp. anomaly (true) [K]

tem

p. a

nom

aly

(pre

dict

ed)

[K]

0 1 2 30

1

2

3

STD of temperature (true) [K]

ST

D o

f tem

pera

ture

(pr

edic

ted)

[K]

Figure 5: Scatterplots for GP solution from Fig. 4: GPoutput vs. reference (’true’) values. Shown are 2500 ran-domly chosen points.

Figure 6: Cross section for GP solution from Fig. 4.

●●

●●

●● ●

●●

●●

●●

● ● ●

●●

●●

●●

● ●●●

0.7 0.80.20

0.25

0.30

0.35

0.40

RMSE [K]

ME

(ST

D)

[K] ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.7 0.80.2

0.4

0.6

0.8

1.0

1.2

1.4

RMSE [K]

EM

D

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

0.20 0.25 0.30 0.35 0.400.2

0.4

0.6

0.8

1.0

1.2

1.4

ME(STD) [K]

EM

D

Figure 7: Values of the objectives for the 50 solutions of the final pareto set.

Table 1: Values of the objectives for GPsolution from Fig. 4.

interp. downsc.RMSE [K] 0.70 0.84

ME(STD) [K] 0.58 0.22EMD 1.75 0.34

4. ConclusionOur preliminary results show that realistic fine-scale structures can be retrieved from the coarse scale

input, which constitutes a major advancement compared to the usually applied interpolations methods.

5. OutlookExpansion of training and validation data sets.Find and test further objectives (for better quantification ofspatio-temporal characteristics of fine-scale fields).Downscale remaining atmospheric variables required.Implement the downscaling in the TerrSysMP.Downscaling ensemble.

AcknowledgementsWe gratefully acknowledge the financial

support from Transregio 32 ’Patterns in

Soil-Vegetation-Atmophere Systems’ funded

by the ’Deutsche Forschungsgemeinschaft’

(DFG). Furthermore we would like to thank

Annika Schomburg for providing training data

and COSMO model support.

ReferencesSchomburg, Annika, et al. "A downscaling scheme for atmospheric variables to drive soil-vegetation-atmosphere transfer models." Tellus B 62.4 (2010): 242-258.Silva, Sara, and Jonas Almeida. "GPLAB-a genetic programming toolbox for MATLAB." Proceedings of the Nordic MATLAB conference. 2003.Zitzler, Eckart, and Lothar Thiele. "Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach." Evolutionary Computation, IEEETransactions on 3.4 (1999): 257-271.Koza, John R. Genetic Programming: vol. 1, On the programming of computers by means of natural selection. Vol. 1. MIT press, 1992.

contact: [email protected]