Application of machine learning methods to model bias

2020.01.30 Data Assimilation Workshop

Application of machine learning methods to model bias correction:

Lorenz-96 model experiments

Data Assimilation Research Team, RIKEN Center for Computational Science

Arata Amemiya*, Takemasa Miyoshi

Shlok Mohta

Graduate School of Science, the University of Tokyo

Motivation: bias in weather and climate models

Weather and climate models have model biases from various sources

• Truncation error• Approximation of unresolved physical processes

- Convection- Small-scale topography- Turbulence- Cloud microphysics

(From JMA website)

Treatment of forecast error in data assimilation

𝑷𝑡+1𝑓

= 𝑴𝑷𝑡𝑎𝑴𝑻

Kalman filter

𝒙𝑡+1𝑓

=𝓜(𝒙𝑡𝑎)

𝑴 = 𝜕𝓜/𝜕𝒙

http://www.data-assimilation.riken.jp/jp/research/index.html

𝑷𝑎 = 𝑰 − 𝑲𝑯 𝑷𝑓

𝑲 = 𝑷𝑓𝑯𝑇 𝑯𝑷𝑓𝑯𝑇 + 𝑹−1

𝒙𝑎 = 𝒙𝑓 +𝑲 𝒚 − 𝐻(𝒙𝑓)

Model bias leads tothe underestimation of forecast(background) error

Previous analysis

Analysis mean

Observation

Forecast mean

Update state and forecast error covariance

Calculate analysis state and error covariance

Calculate Kalman gain

Treatment of imperfect model

Insufficient model error degrades the performance of Kalman filter

1. Covariance inflation

2. Correction of systematic bias component

𝑷𝑎 → 𝛼 𝑷𝑎

𝒙𝑡+1𝑓

= 𝒙𝑡+1𝑓

+ 𝒃

𝒙𝑡+1𝑎 = 𝒙𝑡+1

𝑓+𝑲 𝒚𝑡+1 − 𝐻 𝒙𝑡+1

𝑓

Relaxation-to-prior

multiplicative inflation

𝑷𝑎 → 𝑷𝑎 +𝑸additive inflation

𝑷𝑎 → (1 − 𝛼) 𝑷𝑎 + 𝛼𝑷𝑓

Bias correction with simple functional form

Dimensionality reduction can be applied using Singular Value Decomposition (SVD) (Danforth et al. 2007)

Set of training data {𝛿𝒙, 𝒙𝑓}→ bias correction term 𝑫(𝒙) estimation

𝑫0 = Τ𝛿𝒙 Δ𝑡

𝑳𝒙′ = 𝑪𝛿𝒙,𝒙 𝑪𝒙,𝒙−1𝒙′ ∕ Δ𝑡

𝑫 𝒙 = 𝑫0 + 𝑳𝒙′ 𝒙′ = 𝒙𝑓 − 𝒙𝑓

Steady component

𝑪 : correlation matrix

Linearly-dependent component (Leith , 1978)

✓ “Offline” bias correction

𝑑

𝑑𝑡𝒙 = 𝒇true(𝒙)

𝑑

𝑑𝑡𝒙 = 𝒇model 𝒙

𝛿𝒙 = 𝒙𝑡 𝑡 + Δ𝑡 − 𝒙𝑓 𝑡 + Δ𝑡

𝒙𝑡 𝑡 , 𝒙𝑡 𝑡 + Δ𝑡 …

𝒙𝑓 𝑡 + Δ𝑡

𝑑

𝑑𝑡𝒙 = 𝒇model 𝒙 + 𝑫(𝒙)

Simplest form: linear dependency

Bias correction with nonlinear basis functions

• Coupled Lorenz96 system (Wilks et al. 2005, Arnold et al. 2013)

• Real case: All-sky satellite infrared brightness temperature (Otkin et al. 2018)

Probability of (obs – fcst) vs obs

Higher order polynomials :

Neural networks :

• Coupled Lorenz96 system (Watson et al. 2019)

(Fig.2 of Otkin et al. 2018)

Online bias correction

• Kalman filter

= Simultaneous estimation of state variables and bias correction terms

- Polynomials (Pulido et al. 2018)

✓ “Online” bias correction

- Legendre polynomials(Cameron and Bell, 2016; for Satellite sounding in UK Met Office operational model)

- Steady component (Dee and Da Sliva 1998, Baek et al 2006)

• Variational data assimilation (“VarBC”)

sequential treatment / augmented state

Penalty function forbias-corrected obs error

Penalty function forregularization

Localization

Localization is built-in in LETKF

(Pathak et al. 2018, Watson et al. 2019)

Also in ML-based data driven modelling

✓ Reduced matrix size -> low cost

In high dimensional spatiotemporal system,geographically (and temporally) local interaction is usually dominant

✓ Highly effective parallelization(Miyoshi and Yamane, 2007)

Also used in simultaneous parameter estimation(Aksoy et al. 2006)

(Pathak et al. 2018)

The goal of this study

✓ ML-based online bias correction using Long-Short Term Memory (LSTM)

✓ Combined with LETKF with similar localization

Test experiments with coupled Lorenz96 model

• Experimental Setup

• Online bias correction with simple linear regression as a reference

• (Online bias correction with LSTM)

bias correction in LETKF system

Model



𝑓

observationforecast

𝒙𝑡+1𝑓

Bias correction

Analysis

Bias correction function 𝒃(𝒙𝑓) update

𝒙𝑡+1 = 𝑴 𝒙𝑡

𝒙𝑡+1𝑓

= 𝒃(𝒙𝑡+1𝑓

)

LETKF

𝒚𝑡+1

Lorenz96 model

𝑑

𝑑𝑡𝑥𝑘 = 𝑥𝑘−1 𝑥𝑘+1 − 𝑥𝑘−2 − 𝑥𝑘 + 𝐹

𝑥𝑘 𝑥𝑘+1

• 1-D cyclic domain• Chaotic behavior for sufficiently large 𝐹

Wind @ 250hPaNCEP GFS

https://earth.nullschool.net/

(Lorenz 1996)

𝑘 = 1,2,… , 𝐾

𝑥 = 𝐹

𝐾 = 40, 𝐹 = 8

Largest Lyapunov exp.

Mean 𝑥 𝑡

Coupled Lorenz96 model

𝑑

𝑑𝑡𝑥𝑘 = 𝑥𝑘−1 𝑥𝑘+1 − 𝑥𝑘−2 − 𝑥𝑘 + 𝐹 −

ℎ𝑐

𝑏

𝑗=𝐽 𝑘−1 +1

𝑘𝐽

𝑦𝑗

𝑑

𝑑𝑡𝑦𝑗 = −𝑐𝑏 𝑦𝑗+1 𝑦𝑗+2 − 𝑦𝑗−1 − 𝑐𝑦𝑗 +

ℎ𝑐

𝑏𝑥int[(𝑗−1)∕𝐽]+1

𝑘 = 1,2,… , 𝐾

𝑗 = 1,2,… , 𝐾𝐽

“Nature run”

𝑑

𝑑𝑡𝑥𝑘 = 𝑥𝑘−1 𝑥𝑘+1 − 𝑥𝑘−2 − 𝑥𝑘 + 𝐹 + 𝐴 sin 2𝜋

𝑘

𝐾

Forecast model (Danforth and Kalnay, 2008)

Coupling terms

Multi-scale interaction

𝑦1 𝑦𝐽

Parameters used in this study：

(Wilks, 2005)

… …

(𝐾 = 8)

Large scale (Slow) variables

Small scale (fast) variables

𝐾 = 16, 𝐽 = 16

ℎ = 1, 𝑏 = 20, 𝑐 = 50

𝐹 = 16, A = 1

Data assimilation experiment

Observation operator：identical（obs = model grid）

Error standard deviation： 0.1

Interval: 0.025 / 0.05 / 0.1 (cf: doubling time ≃ 0.2)

Member： 20Localization： Gaussian weighting (length scale = 3 grids)

Covariance inflation: multiplicative (factor: 𝛼)

Obs interval Minimum RMSE Inflation 𝛼

0.025 0.0760 2.9

0.05 0.0851 5.8

0.1 0.0902 15.0

“Observation”

LETKF configuration

Large inflation factor 𝛼 is needed to stabilize DA cycle

“Nature run” + random error

“Control run”: Data assimilation without bias correction

Example: simple linear regression

Model



𝑓

𝒚𝑡+1 = 𝐻 𝒙𝑡+1true + 𝝐obs

observationforecast

𝒙𝑡+1𝑓

Bias correction

AnalysisLETKF

𝒙𝑡+1𝑓

= 𝑾𝒕+𝟏𝑓

𝒙𝑡+1𝑓

+ 𝒃𝑡+1𝑓

Online bias correction by linear regression

𝑾𝑡+1𝑎 = 𝑾𝑡+1

𝑓+ 𝛽𝑲 𝒚𝑡+1 − 𝐻 𝒙𝑡+1

𝑓⋅ (𝒙𝑡+1

𝑓)𝐓 ∕ 𝒙𝑡+1

𝑓 𝟐


𝒃𝑡+1 = (1 − 𝛾)𝒃𝑡

𝑾𝑡+1 = 1 − 𝛾 𝑾𝑡

𝒃𝑡+1𝑎 = 𝒃𝑡+1

𝑓+ 𝛽𝑲 𝒚𝑡+1 − 𝐻 𝒙𝑡+1

𝑓

Online bias correction by linear regression

𝛽 = 0.02, 𝛾 = 0.999 (half-life: ~37) 𝒙𝒇

𝒙𝒂 − 𝒙𝒇

𝒙𝒇 − 𝒙𝒇 :correction

𝒙𝒂 − 𝒙𝒇 has large negative correlation with 𝒙𝒇

Interval (day) Min. infl. RMSE

0.025 2.9 0.0760

0.1 15.0 0.0902

Interval (day) Min. infl. RMSE

0.025 2.0 0.0625

0.1 8.2 0.0849

and mostly offset by bias correction term

Stable with smaller inflation factor 𝛼

Correction term

Time 0-30 120-150

240-270 360-390

Bias: Forecast(before correction)-analysisCorrection

Nonlinear bias correction with ML

Model



𝑓

observationforecast

𝒙𝑡+1𝑓

Bias correction

Analysis

Train the network 𝒃 with 𝒙𝑡+1𝑓

, 𝒙𝑡+1𝑎


𝒙𝑡+1𝑓

= 𝒃(𝒙𝑡+1𝑓

, … )

LETKF

𝒚𝑡+1

RNN

Plan: LSTM implementation

• Activation:tanh / sigmoid(recurrent)

LSTM

• No regularization / dropout

𝒙𝑡𝑓, 𝒙𝑡

𝑎

Input

𝒙𝑡−Δ𝑡𝑓

…𝒙𝑡𝑓

nx=9, nt=5

Output𝒙𝑡𝑎

nx=1

LSTMnx=15

Densenx=10

Densenx=5

Lorenz96LETKF 𝒙𝑡

𝑓

Python TensorFlowTensorflow LSTM is implemented and integrated with LETKF codes

Network architecture

• 1 LSTM + 3 Dense layers

• LETKF-like Localization

Input : localized area

Output : one grid point

Summary

• LSTM-based bias correction is implemented and to be tested

• The efficiency of localization ?

• Systematic model bias degrades forecasts and analysis

• “Online” bias correction with data assimilation has been studied using a fixed basis function set

• “Offline” bias correction can be performed by ML as nonlinear regression

• Online learning ?

Documents

Application of machine learning methods to model bias