Bill Campbell and Liz Satterfield JCSDA Summer Colloquium on Satellite Data Assimilation 27 Jul - 7 Aug 2015 Accounting for Correlated Satellite Observation

Bill Campbell and Liz Satterfield

JCSDA Summer Colloquium on Satellite Data Assimilation

27 Jul - 7 Aug 2015

Accounting for Correlated Satellite Observation Error in NAVGEM

1

Everything the data assimilation (DA) system knows about the error of every

observation everywhere is contained in R

• Observation error is defined as the difference between a measured quantity and the truth, which is the true state of the continuous atmosphere discretized, mapped into model space, then mapped into observation space (via interpolation, radiative transfer, etc.)

• Observation error variance is the time mean squared observation error, which is not observed directly, but computed statistically

• Observation error covariance is the time mean of the product of the errors of different observations (e.g. a temperature measurement error in Fort Collins and a wind speed measurement error in Denver)

• All covariance matrices are symmetric and positive definite• Covariance has physical units; correlation is normalized covariance, with

values between -1 and 1• Observation error covariance matrix R should contain the best estimates

of variances on the diagonal, covariances everywhere else3

Observation Error Defined

4

Sources of Observation Error

IMPERFECTOBSERVATIONS

True Temperature in Model Space

T=28° T=38° T=58°

T=30° T=44° T=61°

T=32° T=53° T=63°

T=44°

1) Instrument error (usually, but not always, uncorrelated)2) Mapping operator (H) error (interpolation, radiative transfer)3) Pre-processing, quality control, and bias correction errors4) Error of representation (sampling or scaling error), which can

lead to correlated error:

5

Correlated Error in Operational DA

• Until recently, most operation DA systems assumed no correlations between observations at different levels or locations (i.e., a diagonal R)

• To compensate for observation errors that are actually correlated, one or more of the following is typically done:– Discard (“thin”) observations until the remaining ones are

uncorrelated (Bergman and Bonner (1976), Liu and Rabier (2003))– Local averaging (“superobbing”) (Berger and Forsythe (2004))– Inflate the observation error variances (Stewart et al. (2008, 2013)

• Theoretical studies (e.g. Stewart et al., 2009 & 2013) indicate that including even approximate correlation structures outperforms diagonal R with variance inflation

• *In January, 2013, the Met Office went operational with a vertical observation error covariance submatrix for the IASI instrument, which showed forecast benefit in seasonal testing in both hemispheres (Weston et al. (2014))

6

Error Covariance Estimation Methods

• There are four statistical methods we are aware of for observation error covariance estimation: Hollingsworth & Lönnberg (1986) (H-L), Fisher(2003), Desroziers (2005), and Daescu (2013)

• H-L, Fisher (aka background error method), and Daescu can diagnose observation error variances (diag(R)), and H-L can diagnose vertical correlation, but none of these can diagnose horizontally-correlated observation error

• The background error method assumes that the forecast error covariance matrix B is essentially correct, which is difficult to validate

• Desroziers accounts for all types of correlated error, but the diagnosed R depends on the initial specification of the observation and forecast error covariance matrices (R and B, respectively)

• The diagnosed R must be made symmetric and positive definite, and it may be necessary to adjust its eigenvalue spectrum to address convergence issues

• None of these methods is perfect

7

Hollingsworth-Lönnberg Method(Hollingsworth and Lönnberg, 1986)

• Use innovation statistics from a dense observing network• Assume horizontally uncorrelated observation errors• Calculate a histogram of background innovation covariances binned by

horizontal separation• Fit an isotropic correlation model, extrapolate to zero separation to estimate

the correlated (forecast) and uncorrelated (observation) error partition

Correlation between observation error and forecast error: possibly induced by QC methods or bias correction methods

Observation error variance

Bias terms

Extrapolate red curve to zero separation, and compare with innovation variance (purple dot)

Mean of ob minus forecast (O-F) covariances, binned by separation distance

u2

c2

Assumes no spatially-correlated observation error

2 2 2o f innovation

22 ' '

22 2 2 ' '2

innovation o o f f

innovation o f o f o f

8

• From O-F, O-A, and A-F statistics from any model (e.g. NAVGEM), the observation error covariance matrix R, the representer HBHT, and their sum can be diagnosed

• This method is sensitive to the R and HBHT that is prescribed in the DA system

• An iterative approach may be necessary• Diagnose R1 , which will be different from the original R • Symmetrize R1 , possibly adjusting its eigenvalue spectrum• Implement R1 and run NAVGEM • Diagnose R2 , which we hope will be closer to Rtrue

Desroziers Method(Desroziers et al. 2005)

4DVar Primal Formulation

1

11 1 1 1

1 1 1

T Tf f

T T T Tf

T Tf

w x x BH HBH R y H x

B H R H w B H R H BH HBH R y H x

B H R H w H R y H x

Scale by B-1/21 2

1 2

s B w

w B s

1 2 1 1 1 2 1 2 1

1 2 1 1 2 1 2 1

T Tf

T Tf

B B H R H B s B H R y H x

I B H R HB s B H R y H x

4D-var iteration is on this problem -- We need to invert R!

4DVar Dual Formulation

Scale by R-1/2

1/2 1/2

,{ }i j

R R

R

R C

diag

C 4D-Var iteration is on this problem – No need to invert C!

Iteration is done on partial step and then mapped back with BHT

• An advantage of the dual formulation is that correlated observation error can be implemented directly

• No matrix inverse is required, which lifts some restrictions on the feasible size of a non-diagonal R

• In particular, implementing horizontally correlated observation error is significantly less challenging

11

Correlated Observation Error for the ATMS Microwave Sounder

ATMSMicrowave Sounder

13 temperature channels9 moisture channels

Channel Number

Cha

nnel

Num

ber

4 5 6 7 8 9 10 11 12 13 14 15 18 19 20 21 22

22

21

20

19

18

15

14

13

12

11

10

9

8

7

6

5

40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Current observation error correlation matrix used for ATMS,

and for ALL observations

12

Error Covariance Estimationfor the ATMS

Statistical Estimate

Chan

nel N

umbe

r

4 5 6 7 8 9 10 11 12 13 14 15 18 19 20 21 22 Channel Number

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 4

5

6

7 8

9

10

11

12 1

3 1

4 15

18

19

20

21

22

Desroziers’ method estimate of interchannel portion of observation

error correlation matrix for ATMS

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 4

5

6

7

8 9

10

11

12 1

3 14

15

18 1

9 20

21

22 4 5 6 7 8 9 10 11 12 13 14 15 18 19 20 21 22

Chan

nel N

umbe

r

Current Treatment

Channel Number

Temperature Temperature

Moisture Moisture

Current observation error correlation matrix used for ATMS,

and for ALL observations

13

Iterating Desroziers

Old Statistical Estimate

Chan

nel N

umbe

r

4 5 6 7 8 9 10 11 12 13 14 15 18 19 20 21 22 Channel Number

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 4

5

6

7 8

9

10

11

12 1

3 1

4 15

18

19

20

21

22

Desroziers’ method estimate of interchannel portion of observation

error correlation matrix for ATMS

4

5

6

7

8 9

10

11

12 1

3 14

15

18 1

9 20

21

22 4 5 6 7 8 9 10 11 12 13 14 15 18 19 20 21 22

Chan

nel N

umbe

r

New Statistical Estimate

Channel Number

The change is not large, which is (weak) evidence that the procedure may converge

14

Practical Implementation: What about Convergence?

• The condition number of a matrix X is defined by σmax(X)/σmin(X), which is the ratio of the maximum singular value of X to the minimum one (same as the eigenvalue ratio if X is symmetric and positive definite).

• It is a commonly-used barometer for the numerical stability of matrices.

• How to improve conditioning: 1. Increase the diagonal values (additively) of the matrix until

the desired condition number is reached (e.g. Weston et al. (2014)).

2. Preconditioning 3. Find a positive definite approximation to the matrix such

that the condition number is constrained.

Condition Number Constrained Correlation Matrix Approximation

We want to find a positive definite approximation to the matrix ,

ˆp k

X X Xis minimized. In the trace norm, simply set all of the smallest singular values equal to the value that gives the desired condition number, and then reconstruct the matrix with the singular vectors to obtain the approximate matrix.

Another method, used by Weston et al. 2014, is to increase the diagonal values of the matrix until the desired condition number is reached. We believe there is better theoretical justification for the first method using the Ky Fan p-k norm (Tanaka, M. and K. Nakata, 2014)

1

,1

k pp

ip ki

X XThe Ky Fan p-k norm of m n X where

i X denotes the ith largest singular value of X

When p=2 and k=n, it is called the Frobenius norm;when p=1 and k=n, it is called the trace norm.

16

Condition Number Constrained Correlation Matrix Approximation

0

0.05

0.1

0.15

0.2

Cond. Number = 18 Cond. Number = 18

Chan

nel N

umbe

r

5 7 9 11 13 15 19 21

Channel Number

0

0.05

0.1

0.15

0.2 5 7 9 11 13 15 19 21

Channel Number

5

7

9

11

1

3

15

1

9

21

5

7

9

11

1

3

15

1

9

21

Difference between original and conditioned correlation matrix

Trace Norm Increase Diagonal

Convergence and the Cauchy Interlacing Theorem

Cauchy interlacing theoremLet A be a symmetric n × n matrix. The m × m matrix B, where m ≤ n, is called a compression of A if there exists an orthogonal projection P onto a subspace of dimension m such that P*AP = B. The Cauchy interlacing theorem states:

Theorem. If the eigenvalues of A are α1 ≤ ... ≤ αn, and those of B are β1 ≤ ... ≤ βj ≤ ... ≤ βm, then for all j < m + 1,

Notice that, when n − m = 1, we have αj ≤ βj ≤ αj+1, hence the

name interlacing theorem.

j j n m j

What happens when radiance profiles are incomplete (i.e., at a given location, some channels are missing, usually due to failing QC checks)?

http://en.wikipedia.org/wiki/Compression_(functional_analysis)

18

Conjugate Gradient Convergence

Goal

C

19

Using the Desroziers Diagnostic for IASI Channel Selection

• Water vapor channels 2889, 2994, 2948, 2951, and 2958 have very high error correlation (>0.98)

• The eigenvectors corresponding to the 4 smallest eigenvalues project only on to these 5 channels

• It makes sense to use the Desroziers diagnostic to do a posteriori channel selection, which has the bonus of improving the condition number of the correlation matrix, and thus solver convergence

20

Conjugate Gradient Convergence

Goal

21

• The Desroziers error covariance estimation methods can quantify correlated observation error

• Minimal changes can be made to the estimated error correlations to fit operational time constraints

• After accounting for correlations, variances could potentially be reduced (assuming correct background)

• Correctly accounting for vertically correlated observation error in data assimilation has already yielded superior forecast results at multiple operational NWP centers without a large computational cost

• What about horizontally correlated error?

Main Conclusions

Correlated Error

RedSat

140 180 220 260 300 °K

Uncorrelated Error

WhiteSat

140 180 220 260 300 °KTruth (No Error)

True Atmosphere

• Imagine two hypothetical satellite instruments looking down on Hurricane Sandy• WhiteSat sees a fuzzy version of the truth; current DA systems handle uncorrelated error well• RedSat sees a warped version of the truth; current DA systems handle correlated error poorly• High resolution (i.e. future) instruments are more likely to have correlated error, which can be

much more subtle than this example 22

Visualizing Horizontally Correlated Observation Error

140 180 220 260 300 °K

23

• How can we best estimate errors in Desroziers/Hollingsworth- Lönnberg diagnostics?– Should we expect agreement between different methods?– Will the Desroziers diagnostic converge if both R and B are

incorrectly specified?– Amount of data required to estimate covariances? Seasonal

dependence?– Best methods to symmetrize the Desroziers matrix?

• How to gauge improvement?– Do we also need to adjust to see overall improvement to the

system? – How do we maintain the correct ratio for DA?

• What about convergence?– Should we do an eigenvalue scaling to improve the condition

number?

Discussion

24

References

IMPERFECTOBSERVATIONS

Hollingsworth, A. and P. Lonnberg, 1986. The statistical structure of short-range forecast errors as determined from radiosonde data. Part I: The wind field. Tellus, 38A, pp 111-136.

Desroziers, G., et al., 2005. Diagnosis of observation, background and analyis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, pp. 3385-3396.

Bormann , N. and P. Bauer, 2010: Estimates of spatial and interchannel observation-error characteristics for current sounder radiances for numerical weather prediction. I: Methods and application to ATOVS data. Quart. J. Roy. Meteor. Soc., 136, pp. 1036-1050.

Gorin, V. E., and M. D. Tsyrulnikov, 2011. Estimation of Multivariate Observation-Error Statistics for AMSU-A data. Mon. Wea. Rev., 139, pp. 3765-3780.

Stewart, L. M. et al., 2013. Data assimilation with correlated observation errors: experiments with a 1-D shallow water model. Tellus, 65.

Weston, P. P. et al., 2014. Accounting for correlated error in the assimilation of high-resolution sounder data. Quart. J. Roy. Meteor. Soc., 140, pp. 2420-2429.

Tanaka, M., and K. Nakata, 2014. Positive definite matrix approximation with condition number constraint. Optim. Lett., 8, pp. 939-947.

Documents

Bill Campbell and Liz Satterfield JCSDA Summer Colloquium on Satellite Data Assimilation 27 Jul - 7 Aug 2015 Accounting for Correlated Satellite Observation